Reputation: 3912
In Mule, I have quite many records to process, where processing includes some calculations, going back and forth to database etc.. We can process collections of records with these options
Batch processing
ForEach
Splitter-Aggregator
So what are the main differences between them? When should we prefer one to others?
Mule batch processing option does not seem to have batch job scope variable definition, for example. Or, what if I want to benefit multithreading to fasten the overall task? Or, which is better if I want to modify the payload during processing?
Upvotes: 2
Views: 13001
Reputation: 149
I have been using approach to pass on records in array to stored procedure. You can call stored procedure inside for loop and setting batch size of the for loop accordingly to avoid round trips. I have used this approach and performance is good. You may have to create another table to log results and have that logic in stored procedure as well.
Below is the link which has all the details https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro
Upvotes: 0
Reputation: 546
When you write "quite many" I assume it's too much for main memory, this rules out spliter/aggregator because it has to collect all records to return them as a list.
I assume you have your records in a stream or iterator, otherwise you probably have a memory problem...
So when to use for-each and when to use batch?
The most simple solution, but it has some drawbacks:
Within the loop, you can have several steps (message processors) to process your records (e.g. for the mentioned database lookup).
May be a drawback, may be an advantage: The loop is synchronous. (If you want to process asynchronous, wrap it in an async-scope.)
A little more stuff to do / to understand, but more features:
So it looks like you better use batch.
Upvotes: 10
Reputation: 164
For Splitter and Aggregator , you are responsible for writing the splitting logic and then joining them back at the end of processing. It is useful when you want to process records asynchronously using different server. It is less reliable compared to other option, here parallel processing is possible.
Foreach is more reliable but it process records iteratively using single thread ( synchronous), hence parallel processing is not possible. Each records creates a single message by default.
Batch processing is designed to process millions of records in a very fast and reliable way. By default 16 threads will process your records and it is reliable as well.
Please go through the link below for more details.
https://docs.mulesoft.com/mule-user-guide/v/3.8/splitter-flow-control-reference
https://docs.mulesoft.com/mule-user-guide/v/3.8/foreach
Upvotes: 3