Reputation: 2991
We process lots of files(around 500) overnight and those files comes every few min. But when they come , it is in group of 30-50. Is it a good idea to launch job for each file or group them and process it using multithreaded step?
Upvotes: 2
Views: 1247
Reputation: 21503
Instead of going multithreaded directly or job per file, I'd recommend using partitioning. Using the MultiResourcePartitioner
, you can create a partition per file which means each file get's its own step. By doing this, you can avoid some of the threading complexities (step scope stateful components), and still maintain things like restartability and the independent execution of each file with in the "batch" (run of the job). You can read more about partitioning in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/scalability.html
Upvotes: 2
Reputation: 111
It looks like the order in which files are processed does not matter.
I would use an instance of the batch job per file and not a multi-threaded step. Some advantages of using separate job instances are
It is easier to implement that a multi-threaded step.
Errors in one file will not affect the processing of other files.
If your files are very large, you can implement a multi-threaded step to process records of one file in parallel. This is something I would consider only if performance is not to expectations.
Multi-threaded programming in general is hard. Spring batch does a good job of abstracting the complexities of parallel processing but I have found that there are usually nuances to deal with, so it is best to avoid multi-threaded steps if you can.
Upvotes: 1