Reputation: 31
Spring job description: Delete records from a table. Will be processing about 5 million records.
Step: chunk size - 10,000, calls reader and writer
Reader: extends JpaPagingItemReader and reads records from Oracle db based on a where clause. paging size - 10,000
JpaItemWriter: extends JpaItemWriter and deletes the records.
Issue: The records to be processed by the batch are say 90,000 (by running the reader query in SQLDeveloper). The batch only processes 50,000. NOTE there are no skipped records and the batch exits successfully with a status of Complete and no errors are logged in the logs either. When the batch is run again another 20,000 (out of the 40,000) get processed and so on...
I am not sure why this is occurring. Appreciate any help. Thanks a lot.
Step Configuration:
@Bean("CleanupSkuProjStep")
public Step cleanupSkuProjStep()
{
return stepBuilderFactory.get("cleanupSkuProjStep") .<SkuProj, SkuProj>chunk(10000) .reader(cleanupSkuProjReader) .writer(cleanupSkuProjWriter) .listener(cleanupSkuProjChunkListener) .build();
}
Reader Configuration:
this.setPageSize(10000);
this.setEntityManagerFactory(entityManagerFactory);
this.setQueryString(sqlString);
Writer has no configs.
Job configuration:
@Bean
public Job job()
{
log.info("Starting job: CleanupSkuProjJob");
return jobs.get("CleanupSkuProjJob") .listener(jobListener) .incrementer(new RunIdIncrementer()) .start(cleanupSkuProjStep) .build();
}
Upvotes: 3
Views: 797
Reputation: 23
I struggled with the same problem. In my case, one job had three steps and each step did that flow:
reading -> transforming -> writing(to new tables) -> deleting(from old tables)
As a result, I got 100% of the records read, transformed, and written and 50% of the records deleted.
I suppose that situation was related to of Pagination ("Iteration") of records. As we know, we can't remove objects from a list while iterating. And I feel that something similar is here. But I'm not sure 100%
I had many records to delete and I can't do it without chunks. I needed it. On the other side, the memory of DB was every time crushed because the records to delete were too many.
What I did. I have changed my previous flow to that flow:
Step1: reading -> transforming -> writing
Step2: reading -> deleting
Step3: checking if still exists records to delete
a: If yes, go to Step2
b: If no, go forward
For checking, I used JobExecutionDecider interface and I return FlowExecutionStatus.class with custom status.
And my job flow looks like that:
return jobBuilderFactory
.get("job-name")
.start(step1')
.next(step2')
.next(step3').on("REPEAT").to(step2').from(step3').on("CONTINUE")
.to(step1'')
.next(step2'')
.next(step3'').on("REPEAT").to(step2'').from(step3').on("CONTINUE")
.end()
.build()
.listener(someListener)
.build();
Right now, 100% of records are transformed, written, and deleted. But still step2 deletes 50% of records but repeats as many times until it clears them all
I hope, I helped
Upvotes: 0