Reputation: 133
I need to insert aggregation in my Spring Batch jobs. But the aggregation step need to have the entire data set available.
In pure SQL, it's easy to code SQL aggregation requests : the full data set (as stored in database) is available.
But in Spring Batch jobs, everything is done in memory, and spread in chunked. So howto deal with that kind of data strewing ?
Do you have any advice concerning the best practices to insert aggregation steps/processes ?
Thx a lot for your enlightments
Upvotes: 1
Views: 5480
Reputation: 3784
You have Partitioning option in spring batch which can have StepExecutionAggregator
, it has aggregate method which accepts list of StepContext
of all partitioned steps.
We had i.e. integration with soap server where we first received list of something that needs to be processed, than we partitioned it to child steps and processed in parallel and after each child step finishes aggregator is invoked which can do stuff based on data in child step context.
It is good way if you have something in your data which can be good rule for partitioning (i.e. pull list of items from DB and process each item in parallel, save item data in step context, use aggregator and combine everything in each step context and do common operation on combined data).
Here is link to example with partitioning (there is no aggregation but you can add it to masterStep
).
Upvotes: 8