Friday, August 12, 2016

Oracle BPM 11g/12c: How to Catch an Event in the Same Process

A customer of mine was kind of surprised that when you throw an event in a component of a SCA composite, that the same component cannot catch that event and act upon it. This is a known limitation, for which there is a work-around, which I will discuss in this article.

The work-around is quite simple: another loosely coupled component in the same composite can listen to the event, so all you have to do is to create a BPEL or BPM process as-a-service that is subscribed to the event, and that interacts with the main process that you want to act upon it.

To show that a component cannot listen to its own event, and that the work-around actually works, I used the following test process. No worries, it looks more complex than it is.

The parent process above takes a parameter as input so that I can let it execute either one of the following three scenarios, which consist of throwing an event and then catch it:

  1. In the same (parent) process model
  2. In a reusable sub-process (called through a Call activity)
  3. In a process as-a-service that is called through a Send / Receive activity


There are 4 parallel flows between the OR-gateways:

  • The top flow has a Wait User activity to make it pause and waits for the event.
  • The second flow has a Call Child Call activity which calls the reusable process below.
  • The third flow has the Send/Receive activities to call the process as-a-service below.
  • The bottom one waits 2 seconds to give one of the other flows time to be activated, and then throws either one of two events, depending on whether I want to test catching it in the parent or in the reusable child (for this you cannot use the same event type, that's why).

Only 1 of the first 3 flows is activated at any time, while the last flow (with the events) is always activated. Furthermore the parent process has an Event Sub-process that listens to the event that is thrown by the Throw Internal Event event.

The reusable child is also very basic. It has a User activity to make it pause and wait for the event. It also has a an Event Sub-process that listens to the event that is thrown by the Throw Internal Event for Child event. If it is activated, it will map some variable to itself (to see something concrete in the audit trail), and then it will withdraw the Wait task.
The (child) process as-a-service does the same as the reusable child, except for that it has a start and end event which makes it an asynchronous BPM process as-a-service.
Now when you start an instance of the parent for each of the 3 scenarios, the result in Enterprise Manager is as below:
The instance at the bottom (1450067) belongs to the scenario where the parent tries to catch the event. Which fails as you can see by the fact that it is still running. And yes, I did make sure the Catch Event is correlated properly to the Start Event. The next instance (1450068) is the one that catches it, but as you can see they both are still running. When clicking on the second one, it somehow figured out that both instances are related, but the first instance won't act upon it.

The third instance (1450069) is that of the scenario where the reusable child tries to catch the event. From the fact that there is no other instance, you can see that it does not even listen to the event.

The fourth instance (1450070) is that of the parent that calls the child process as-a-service. The fifth (top) instance (1450071) is that of the child that catches the event, and then calls back the parent instance. As you can see, those are the only two instances that actually completed. So only in this scenario it actually works.

Friday, July 01, 2016

Oracle SOA: Using Sensors on Optional Elements

In this posting I describe an issue you may run into when configuring composite sensors on optional elements, and why it is good practice to always add a filter that checks if the element is actually present.

If you define a sensor on a composite to record elements that are optional, you may find an error in the logs similar to the below:


If that is the case you probably have a composite that takes a request with one or more optional elements, of which one or more are not provided, which then is transformed by an XSLT. I have only seen it in combination with a Mediator, but don't know if it would happen in case of BPEL or BPM as well.

For some reason (a product limitation, if not a bug) it will try to store the sensor with a null value, which will result in the above error. You will not see the error when the input is being mapped using XPath, as then it will not try to store the sensor.

To make it always work, the solution is to add a filter on the value to make it only store the value if the element is actually there. I will now explain how to do that.

Let's assume you have a payload like this:


If you want a sensor on the secondElement, you can add a sensor by right-mouse clicking the service and choose Configure Sensors:


For this optional element you should configure it as follows:


It may look a bit like overkill to always do it, but then again if you do it right away, you no longer have to worry about it later, as it will always work, XSLT or not. 

Wednesday, June 29, 2016

Why in Oracle BPM/SOA Suite Attaching a Fault Policy to a Synchronous Service Is Not a Good Idea

In this posting I will explain why in the Oracle BPM/SOA Suite you should not attach fault policies to synchronous services.

This posting has been updated on July 1 2016 to explain why the recovery of ServiceA failed.

The other day I investigated some BPM process instance that had an unrecoverable error. It was calling a synchronous service, that on its turn was calling another synchronous service, that on its turn was calling a external, synchronous service exposed through the Oracle Service Bus. That latter call failed (due to a timeout). As all the composites were having a fault policy attached to them, it was expected that the instance was recoverable. Instead there was some JTA transaction error that rolled back all the way up to last dehydration point in the (top level) BPM process, and from there was retried two times before it finally gave up and went into a coma. Big surprise!

What I recommended to them to prevent this in the future, was the following:

  1. For all synchronous services detach the fault policy. Only attach fault policies to asynchronous and fire&forget services,
  2. Where possible, do asynchronous calls from the BPM process (instead of synchronous), 
  3. Wherever possible make all synchronous services idempotent or make them asynchronous / fire&forget.

If you want to understand why, read on!

Let's assume we have the following chain of services:


I created a FailingService that I can let succeed or fail depending on the input.

Now what happens when BPMProcess calls ServicePS with request 'fail' is the following:

  1. ServiceA errors because of the fault thrown by FailingService
  2. BPMProcess and ServicePS both error because of a timeout
  3. Because of fault policy attached to all them, all go into recoverable state (human intervention)



As ServiceA is in a recoverable state, why not try to recover and see if that will fix the flow?

Now what happens when a retry is done from ServiceA with payload 'normal' is the following:

  1. ServiceA completes successfully
  2. However, BPMProcess & ServicePS are still in recoverable state


The explanation for this is that, because ServicePS was already in a recoverable state, it will not receive the response from ServiceA, as it is no longer listening.

Now let's see what happens when we try to recover the instance of ProcessPS and change the payload from 'fail' to 'normal'


  1. Although the payload was changed to 'normal', we still end up with a new errored instance of ServiceA, as the request of the call from A to FailingService did not change with it (i.e. still 'fail')



In the meantime I understand why ServiceA still called the FailingService with payload 'fail' (I picked the wrong call to retry from the drop-down), but even if the call would have been successful, the FailingService would have been called twice, and let's just hope it is idempotent!

To prevent we get more of these duplicate calls, we recover the (top level) BPM instance instead.

Now what happens when a retry is done from BPMProces with payload 'normal' is the following:

  1. There are new (successful) calls to ServicePS -> ServiceA -> FailingService
  2. However, there are still running instances of ServicePS and ServiceA (they are still in a recoverable state)



The explanation being that these instances were still running after the previous (failed) attempt. So now we still have to abort these running instances to prevent duplicate calls. All in all not very convenient.

The solution is to never let an asynchronous service use a fault policy that either initiates human intervention or does 1 or more retries. The point being that in the meantime the consumer will have timed out, and never receive the response even it succeeds later on.

The best layer to handle errors with synchronous services is a layer that has 'knowledge' about the context of the process. Normally that is the business process itself. The reason being that a policy that fits one process may not fit another.

On the other hand, there are some good arguments for not letting system errors bubble all the way up to a business process. Instead you should consider handling it in the next layer below it - in this case being ServicePS - by making all calls from the business process to ServicePS either asynchronous, or fire&forget (the latter when successful continuation of the process is not depending upon the call). ServicePS will then handles the error using fault policies. You have two options to recover:

  • You recover the instance of ProcessPS, or (when that fails for whatever reason) 
  • Abort the instance of ProcessPS, and do an alter flow on the business process by moving the token from the Receive back to the Send activity. 

As a matter of fact, this customer actually created this ServicePS as a process-specific layer that sits in between the business process and any other service. A similar layer may not be feasible in your case, in which the solution would be to let the error bubble all the way up to the process instance and handle it there (using fault policies).

Thursday, May 12, 2016

Oracle BPM 12c: Browsing the SOAINFRA

In this article I discuss some tables from the SOAINFRA schema that might be most interesting to use when trying to find out why you don't see in Enterprise Manager what you expect.

Going from 11g to 12c, some things have significantly changed in the SOAINFRA schema. For example, your normal partners in helping with "what happened with my process?" type of queries, like the component_instance, and bpm_process tables, have become obsolete. On the other hand you have new friends with tables like sca_flow_instance, and sca_entity.

The following discusses some tables that you might want to look into when digging in the dirt of the SOA/BPM engine's intestines.

The tables I would like to discuss in more detail are:
- sca_flow_instance
- cube_instance
- wftask
- sca_entity
- bpm_cube_process
- bpm_cube_activity

Given that there is no official documentation on these tables, this is based on my observations an interpretations. No guarantee that these are flawless, so if you have anything to improve or add, let me know!

To better understand the data in the SOAINFRA in relation to an actual process, I used 1 composite with the following processes, that has two subprocesses (another BPM process and a BPEL process). The BPM subprocess has not been implemented as a reusable process (with a Call activity) but instead as a process-as-a-service.






As a side note: originally I created this process to be able to verify how the different states a process and its children can have, are represented in Enterprise Manager. The reason being that on one of my projects there were some doubts if this is always correct, given some issues in the past with 11g. With 12c I could find none. However, as the test case does not concern inter-composite interaction, nor does it include all types of technologies, you could argue that the test case is too limited to conclude anything from it. Also worth to mention is that the instances are ran on a server in development mode, and without in-memory optimization. I have heard rumors that you will observer different behavior when you disabled auditing completely. In some next posting I hope to discuss that as well.

I initiated several instances, for each possible state one:


sca_flow_instance

As the name already suggests, this table contains 1 entry for each flow instance. You might be interested in the following columns:
  •   flow_id
  •   title
  •   active_component_instances
  •   recoverable_faults
  •   created_time
  •   updated_time

When queried this looks similar to this:

The query used is like this:

select sfi.flow_id
,      sfi.title
,      sfi.active_component_instances
,      sfi.recoverable_faults
,      sfi.created_time
,      sfi.updated_time
from  sca_flow_instance sfi
order by sfi.created_time

cube_instance

This table contains 1 entry for each component instance in the flow (e.g. bpmn, bpel). You might be interested in the following columns:
  • flow_id
  • composite_label (*)
  • cpst_inst_created_time (**)
  • composite_name
  • composite_revision
  • component_name
  • componenttype
  • state (of the component <== mention)
  • creation_date (incl time)
  • modify_date (incl time)
  • conversation_id

(*) corresponds with the bpm_cube_process.scalabel
(**) equals sca_flow_instance.created_time

When queried this looks similar to this:

The query used is like this:

select cis.flow_id
,      cis.componenttype
,      cis.component_name
,      cis.state
from   cube_instance cis
order by cis.flow_id


wftask


This table contains an entry for each open process activity and open or closed human activity. You might be interested in the following columns:
  • flow_id
  • instanceid
  • processname
  • accesskey (not for human tasks) (*)
  • createddate
  • updateddate
  • (only in case of human tasks, the flex fields)
  • componentname
  • compositename (not for human tasks)
  • conversationid
  • componenttype (***)
  • activityname
  • activityid (****)
  • component_instance_id (only for human tasks)
  • state (*****)

(*) : the type of activity, e.g. USER_TASK, INCLUSIVE_GATEWAY, END_EVENT
(**) not for human tasks
(***) e.g. Workflow, BPMN
(****) Corresponds with the activityid of bpm_cube_activity. The user activity and its corresponding human task appear to have the same activityid. After the human task is completed, the user activity disappears but the human task is kept with an null state.
(*****) e.g. OPEN for running activities, ASSIGNED for running human tasks. Other states are ABORTED, PENDING_MIGRATION_SUSPENDED, ERRORED, etc.

When queried this looks similar to this:


The query used is like this:

select wft.instanceid
,      wft.processname
,      wft.accesskey
,      wft.createddate
,      wft.updateddate
,      wft.componentname
,      wft.compositename
,      wft.conversationid
,      wft.componenttype
,      wft.activityname
,      wft.activityid
,      wft.component_instance_id
,      wft.state
from   wftask wft
where  wft.flow_id = 130001
order by wft.updateddate

sca_entity

This table contains an entry for each SCA entity (e.g. service, wire). The following column might be of use:
  •  id
  •  composite (name)
  •  label (corresponds with the scalabel of bpm_cube_process)

When queried this looks similar to this:


The query used is like this:

select sen.composite
,      sen.id
,      sen.label
from   sca_entity sen
where  sen.composite = 'FlowState'
order by sen.composite

bpm_cube_process

This table contains metadata. For each deployed composite it contains an entry for each BPM process. If 2 BPM processes in once composite: 2 entries. The following columns might be of use:
  • domainname
  • compositename
  • revision
  • processid
  • processname
  • scalabel
  • compositedn
  • creationdate  (incl time)
  • undeploydate
  • migrationstatus (*)
(*) Values are LATEST, MIGRATED.

When queried this looks similar to this:



The query used is like this:


select bcp.domainname
,      bcp.compositename
,      bcp.revision
,      bcp.processname
,      bcp.processid
,      bcp.scalabel
,      bcp.compositedn
,      bcp.creationdate
,      bcp.undeploydate
,      bcp.migrationstatus
from   bpm_cube_process bcp
where  bcp.compositename = 'FlowState'
order by bcp.processname
,        bcp.creationdate


bpm_cube_activity

This table contains metadata, There is an entry for each individual activity, event, and gateway of a bpmn process. The following column might be of use:
  • processid (corresponds with the bpm_cube_process.processid)
  • activityid
  • activityname (technical, internal name can be found in the .bpmn source)
  • activitytype (e.g. START_EVENT, SCRIPT_TASK, CALL_ACTIVITY, etc.)
  • label (name as in the BPMN diagram)
The rows in the example below have been queried by a join with the bpm_cube_process table on processid, where undeploydate is not null and migrationstatus is 'LATEST' to get only the activities of the last revision of one particular process:


The query used is like this:

select cbi.flow_id
,      cbi.composite_label
,      cbi.cpst_inst_created_time
,      cbi.composite_name
,      cbi.composite_revision
,      cbi.component_name
,      cbi.componenttype
,      cbi.state
,      cbi.creation_date
,      cbi.modify_date
,      cbi.conversation_id
from   cube_instance cbi
order by cbi.creation_date

Obsolete Tables

The following table have become obsolete:
  • bpm_activity
  • bpm_activity_instance
  • bpm_cube_activity_instance
  • bpm_process
  • component_instance
The composite_instance is still used, but more or less superseded by the sca_flow_instance (although the number of instances are not the same). I do not longer find it useful to query.

Monday, March 21, 2016

Oracle BPM 11g: Mapping Empty Elements

In this blog article I explain what happens with mappings for which the source is empty, and you map it to an optional or mandatory element. The scenarios described in this article are based on SOA / BPEL 11g. In some next article I will describe what happens when you do the same in SOA 12c (which is not the same).

Let's assume we have a data structure like this:


And let's assume we have a BPEL that takes a message of the above type as input, and - using a couple of different scenarios - maps it to another element of the same type as output.

The table below shows what happens when you map empty data to a mandatory or optional element (i.e. minOccurs="0"), taking payload validation into consideration, as well as making use of the "ignoreMissingFromData" and "insertMissingToData" features of XPath mappings (only available in BPEL and not in BPM). In the below "null" means that the element is not there at all, "empty" means that the element is there but has no value. As you can see from the XSD an emtpy value is nowhere allowed (otherwise it should have an attribute xsi:nill with value "true").



As you can see, disabling payload validation will lead to corrupt data. But even with payload validation on you may get a result that might not be valid in the context of usage, like an empty mandatory or optional element. Unless empty is a valid value, you should make sure that optional elements are not there when they have no value.

To set "ignoreMissingFrom" and "insertMissingToData", right-mouse click the mapping and toggle the values:


When using the "ignoreMissingFromData" feature with a null optional element mapped to itself, the result is as on the left below. When also the "insertMissingToData" feature is used, the result is as on the right:


Mind that the "insertMissingToData" feature also leads to namespace prefixes for each element.