Reputation: 850
I have spring batch application which reads and writes into the same table. I have used pagination for reading the items from the table as my data volume is quite high. When I set the chunk size as more than 1 then my pagination number is getting updated wrongly and hence failing to read some items from the table. Any idea?
@Bean
public Step fooStep1() {
return stepBuilderFactory.get("step1")
.<foo, foo>chunk(chunkSize)
.reader(fooTableReader())
.writer(fooTableWriter())
.listener(fooStepListener())
.listener(chunkListener())
.build();
}
Reader
@Bean
@StepScope
public ItemReader<foo> fooBatchReader(){
NonSortingRepositoryItemReader<foo> reader = new NonSortingRepositoryItemReader<>();
reader.setRepository(service.getRepository());
reader.setPageSize(chunkSize);
reader.setMethodName("findAllByStatusCode");
List<Object> arguments = new ArrayList<>();
reader.setArguments(arguments);
arguments.add(statusCode);
return reader;
}
Upvotes: 2
Views: 2758
Reputation: 4454
Don't use a pagination reader. The problem is, that this reader executes a new query for every chunk. Therefore, if you add items or change items in the same table during writing, the queries will not produce the same result.
Dive a little bit into the code of the pagination reader, it is clearly obvious in there.
If you modify the same table you are reading from, then you have to ensure that your result set doesn't change during the processing of the whole step, otherwise, your results may not be predictable and very likely not what you wanted.
Try to use a jdbccursoritemreader. This one creates the query at the beginning of your step, and hence, the result set is defined at the beginning and will not change during the processing of the step.
Editet
Based on your code to configure the reader which you added, I assume a couple of things:
this is not a standard springbatch item reader
you are using a method called "findAllByStatusCode". I assume, that this is the status field that gets updated during writing
Your Reader-Class is named "NonSortingRepositoryItemReader", hence, I assume that there is no guaranteed ordering in your result list
If 3 is correct, then this is very likely the problem. If the order of the elements is not guaranteed, then using a paging reader will definitely not work. Every page executes it's own select and then moves to the pointer to the appropriate position in the result.
E.g., if you have a pagesize of 5, the first call will return elements 1-5 of its select, the second call will return elements 6-10 of its select. But since the order is not guaranteed, element at position 1 in the first call could be at position 6 in the second call and therefore be processed 2, whilst element 6 in the first call, could be at position 2 in the second call and therefore never been processed.
Upvotes: 2