Reputation: 109
We are implementing a remote partitioning job using spring batch and spring integration. For this job we are getting a big file, we are splitting that file using unix command and we are running the batch using those files. Is there way i can write a custom partitioning logic for the large file without splitting the file. Please help someone..
Thanks in advance.
-MK
Upvotes: 1
Views: 488
Reputation: 21463
There is a Jira issue for Spring Batch (BATCH-1613 and related pull request) to support multi threaded file reading. However, the issue we've found is that the benefits of having multiple threads is very environmentally specific. In typical environments, you end up not being able to get the data off the disk fast enough with a single file to keep all your partitions busy. The pull request previously linked to saw no benefits when I was testing it over reading a single file with a single thread which is why it wasn't merged (even though the author was able to present stats that showed increased speed in his environment).
If the linked code benefits you, please feel free to use it. However, I'd want to be sure that the benefits are achievable in more environments (or at least have a more concrete understanding of the requirements for it to gain the benefits so they could be documented) before merging it into the framework itself.
If you work out something that works for you (either based on the code linked or something else), we'd love a pull request!
Upvotes: 1