Reputation: 10132
I have written a spring batch job and using SimpleAsyncTaskExecutor
. I tried to use ThreadPoolExecutor
but that is giving much slower performance than SimpleAsyncTaskExecutor
.
These are main points about the job,
1.Processing part of job is most time consuming part. Its basically fires large number of SQL SELECTs
to a different DB table than reader and writer.
Reader and Writer doesn't take much time.Processor is very complex.
2.There is a functional requirement to write output of processor to DB as soon as processor returns a record. This is needed because
processor finding something for writer would be rare and we need
results immediately persisted. In a nutshell, its a business
requirement to have chunk size =1
I am concerned about performance of the job. Performance increases manifold if I make processor logic a bit lightweight so I guess processor is bottleneck.
I am simply using SimpleAsyncTaskExecutor
to achieve parallelism. Job is supposed to run on a powerful multi processor system.
Any ideas about what all I can do more in terms of Spring Batch to make this job faster?
Job has this single step.
@Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader<RemittanceVO> syncReader, ItemWriter<RemittanceClaimVO> writer,
ItemProcessor<RemittanceVO, RemittanceClaimVO> processor) {
return stepBuilderFactory.get("step1")
.listener(zeroReadRowsStepExecutionListener)
.<RemittanceVO, RemittanceClaimVO> chunk(Constants.SPRING_BATCH_CHUNK_SIZE)
.reader(syncReader)
.listener(afterReadListener)
.processor(processor)
.writer(writer)
.taskExecutor(simpleAsyntaskExecutor)
.throttleLimit(Constants.THROTTLE_LIMIT)
.build();
}
Upvotes: 0
Views: 526
Reputation: 4444
If you need to have a chunksize of 1, then there es no way to be fast. You produce so much overhead (for instance updating the batch tables for every single item). Moreover, making DB calls out of a processor has also a very negative impact on performance, since you produce again sql calls for every single item.
The key to a good performance - when it comes to work with DBs - is to reduce to number of calls send to the DB. Eg. by using batchUpdates or SQL statements that select data for the whole chunk, not for a single item (using an appropriate IN-clause).
When I have to read additional data for processing my items, I use two patterns:
Upvotes: 1