Oren
Oren

Reputation: 447

How to Split the work of Spring Batch between several machines when the flow is reading from a single file?

My Flow is:

  1. Read from a single file ( file size ~1TB )
  2. Process each row
  3. Write each row to 2 output files

How can i split the work between more than one machine in order to reduce the overall run-time ?

Upvotes: 0

Views: 124

Answers (1)

Mahmoud Ben Hassine
Mahmoud Ben Hassine

Reputation: 31745

There are at least three techniques for this use case:

  • Physically partition the file using the split command (or equivalent) to create multiple partitions. Then use a partitioned step to process each partition.
  • Logically partition the file (See FlatFilePartitioner in the attached sample in BATCH-1613) and use a partitioned step to process each partition
  • Use a staging table to load the file in it, then use a partitioned step to process partitions on the table (for example IDs 1 -> 1000, 1001 -> 2000, etc)

Hope this helps.

Upvotes: 1

Related Questions