Sue
Sue

Reputation: 31

Spring job not processing all records but exits with a status of Complete

Spring job description: Delete records from a table. Will be processing about 5 million records.

Step: chunk size - 10,000, calls reader and writer

Reader: extends JpaPagingItemReader and reads records from Oracle db based on a where clause. paging size - 10,000

JpaItemWriter: extends JpaItemWriter and deletes the records.

Issue: The records to be processed by the batch are say 90,000 (by running the reader query in SQLDeveloper). The batch only processes 50,000. NOTE there are no skipped records and the batch exits successfully with a status of Complete and no errors are logged in the logs either. When the batch is run again another 20,000 (out of the 40,000) get processed and so on...

I am not sure why this is occurring. Appreciate any help. Thanks a lot.


Step Configuration:

@Bean("CleanupSkuProjStep") 
public Step cleanupSkuProjStep() 
{ 
    return stepBuilderFactory.get("cleanupSkuProjStep") .<SkuProj, SkuProj>chunk(10000) .reader(cleanupSkuProjReader) .writer(cleanupSkuProjWriter) .listener(cleanupSkuProjChunkListener) .build(); 
}

Reader Configuration:

this.setPageSize(10000);
this.setEntityManagerFactory(entityManagerFactory);
this.setQueryString(sqlString);

Writer has no configs.

Job configuration:

@Bean 
public Job job() 
{
    log.info("Starting job: CleanupSkuProjJob"); 
    return jobs.get("CleanupSkuProjJob") .listener(jobListener) .incrementer(new RunIdIncrementer()) .start(cleanupSkuProjStep) .build(); 
}

Upvotes: 3

Views: 797

Answers (1)

Mossi
Mossi

Reputation: 23

I struggled with the same problem. In my case, one job had three steps and each step did that flow:

reading -> transforming -> writing(to new tables) -> deleting(from old tables)

As a result, I got 100% of the records read, transformed, and written and 50% of the records deleted.

I suppose that situation was related to of Pagination ("Iteration") of records. As we know, we can't remove objects from a list while iterating. And I feel that something similar is here. But I'm not sure 100%

I had many records to delete and I can't do it without chunks. I needed it. On the other side, the memory of DB was every time crushed because the records to delete were too many.

What I did. I have changed my previous flow to that flow:

Step1: reading -> transforming -> writing
Step2: reading -> deleting
Step3: checking if still exists records to delete
a: If yes, go to Step2
b: If no, go forward

For checking, I used JobExecutionDecider interface and I return FlowExecutionStatus.class with custom status.

And my job flow looks like that:

return jobBuilderFactory
.get("job-name")
.start(step1')
.next(step2')
.next(step3').on("REPEAT").to(step2').from(step3').on("CONTINUE")
.to(step1'')
.next(step2'')
.next(step3'').on("REPEAT").to(step2'').from(step3').on("CONTINUE")
.end()
.build()
.listener(someListener)
.build();

Right now, 100% of records are transformed, written, and deleted. But still step2 deletes 50% of records but repeats as many times until it clears them all

I hope, I helped

Upvotes: 0

Related Questions