Design ideas for Spring Batch Job better performnace

Question

I have written a spring batch job and using SimpleAsyncTaskExecutor. I tried to use ThreadPoolExecutor but that is giving much slower performance than SimpleAsyncTaskExecutor.

These are main points about the job,

1.Processing part of job is most time consuming part. Its basically fires large number of SQL SELECTs to a different DB table than reader and writer. Reader and Writer doesn't take much time.Processor is very complex.

2.There is a functional requirement to write output of processor to DB as soon as processor returns a record. This is needed because processor finding something for writer would be rare and we need results immediately persisted. In a nutshell, its a business requirement to have chunk size =1

I am concerned about performance of the job. Performance increases manifold if I make processor logic a bit lightweight so I guess processor is bottleneck.

I am simply using SimpleAsyncTaskExecutor to achieve parallelism. Job is supposed to run on a powerful multi processor system.

Any ideas about what all I can do more in terms of Spring Batch to make this job faster?

Job has this single step.

@Bean
    public Step step1(StepBuilderFactory stepBuilderFactory,
            ItemReader syncReader, ItemWriter writer,
            ItemProcessor processor) {

        return stepBuilderFactory.get("step1")
                .listener(zeroReadRowsStepExecutionListener)
                . chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
                .reader(syncReader)
                .listener(afterReadListener)
                .processor(processor)
                .writer(writer)
                .taskExecutor(simpleAsyntaskExecutor)
                .throttleLimit(Constants.THROTTLE_LIMIT)
                .build();
    }

Hansjoerg Wingeier · Accepted Answer

If you need to have a chunksize of 1, then there es no way to be fast. You produce so much overhead (for instance updating the batch tables for every single item). Moreover, making DB calls out of a processor has also a very negative impact on performance, since you produce again sql calls for every single item.

The key to a good performance - when it comes to work with DBs - is to reduce to number of calls send to the DB. Eg. by using batchUpdates or SQL statements that select data for the whole chunk, not for a single item (using an appropriate IN-clause).

When I have to read additional data for processing my items, I use two patterns:

first, I try to select the additional data with a merge reader (a merge reader merges data from different readers together by having the data sorted based on the same key)
second, If that is not possible, I use a special writer, which executes the processing part as well. This way, you can implement your Select with an appropriate IN clause which selects the data for the complete chunk. Therefore, in case of oracle, you only need one call to get the data for a thousand items.

Design ideas for Spring Batch Job better performnace

Answers (1)

Related Questions