Reputation: 6816
From this article we can learn that Spring-Batch
holds the Job's status in some SQL repository.
And from this article we can learn that the location of the JobRepository
can be configured - can be in-memory and can be remote DB.
So if we need to scale a batch job, should we run several different Spring-batch JAR
s, all configured to use the same shared DB in order to keep them synchronized?
Is this the right pattern / architecture?
Upvotes: 0
Views: 285
Reputation: 31590
Yes, this is the way to go. The problem that might happen when you launch the same job from different physical nodes is that you can create the same job instance twice. In this case, Spring Batch will not know which instance to pick up when restarting a failed execution. A shared job repository acts as a safeguard to prevent this kind of concurrency issues.
The job repository achieves this synchronization thanks to the transactional capabilities of the underlying database. The IsolationLevelForCreate
can be set to an aggressive value (SERIALIZABLE
is the default) in order to avoid the aforementioned issue.
Upvotes: 1