victorio
victorio

Reputation: 6646

Spring Batch dynamic chunk size based on the number of rows from a CSV without counting the header row

My application is a scheduled job runner with batch configurations.

I can have CSV files with different number of rows, but I know that the first row will be always the header:

id,firstName,lastName
1,Viktor,Someone
2,Joe,Smith
3,Rebecca,Harper

How should I set up the chunk to be dynamic? The file can contain 5, 10, or even 100000 rows.

So instead of giving a big number to the chunk, I am looking for a better dynamic solution based on the number of the rows without counting the header row!

@Bean
public Step step1() {
    return stepBuilderFactory.get("step1").<Employee, Employee>chunk(100000)
            .reader(reader())
            .writer(writer())
            .build();
}

The reader is a FlatFileItemReader.

Upvotes: 0

Views: 1739

Answers (1)

Mahmoud Ben Hassine
Mahmoud Ben Hassine

Reputation: 31600

What about the following:

@Bean
public Step step1() {
   long lineNumberWithoutHeader = Files.lines(Paths.get("path to your file")).count() - 1;
   int chunkSize = .. // calculate chunk size based on lineNumberWithoutHeader
   return stepBuilderFactory.get("step1").<Employee, Employee>chunk(chunkSize)
        .reader(reader())
        .writer(writer())
        .build();
}

You can refactor the code as needed (inject the file resource or late bind it from job parameters, extract the calculation logic in a separate method, etc), but you got the idea.

Another option would be to use a separate step that does the calculation and put it in the job execution context, then configure your chunk-oriented step with the value from the execution context.

Upvotes: 1

Related Questions