spring batch - Partition Step to rollback all the previous chunk commit 's, when chunk fails

I am using spring batch to process multiple files using MultiResourcePartitioner and all the itemreader and writers are in step scope.Each step runs individual files and commits to database at interval of 1000. when there is any error during current processing, all the previous commits needs to be roll backed and the step will fail . Thus the file contents are not added to the database.

Which is the best way among these:

Using Transaction Propogation as NESTED.

Setting commit interval in chunk with Integer.MAXVALUE , this will not work as the file have large items and fail with heap space.

any other way to have transaction at the step level.

I have the sample xml file shown below:

UPDATES:

It seems that the main table (where direct insert happens) is referred by other tables and materialized views . if i delete the data in this table to remove stale records using processed column indicator , the data spooled using MV will show old data. i think staging table is needed for my requirement.

To implement staging data table for this requirement

Create another parallel step to poll database and write the data whose processed column value is Y.

Transfer data at the end of each successful file completion using step listener (afterStep method).

or any other suggestions.

Upvotes: 2

Answers (3)

Yair Zaslavsky

Reputation: 4137

At the oVirt open source project, Mike Kolesnik, Eli Mesika and myself implemented a full fledged compensation mechanism.
You can clone the project and look at classes that relate to CompensationContext.
I experimented with spring-batch in the last days, for the first time in my life, and looks like that for batch operation which is composed of the same CRUD operation type all over - for example, batch of insert, having a helper column can help.
What I'm trying to understand is whether we can somehow intercept the job id, and store it in the table that contains the inserted data (I.E - by having a column of job_id), or for example store a pair of job_id and entity_id in separate table, and then the compensation in case a job fails will be to erase all entries per the job.

Upvotes: 0

dma_k

Reputation: 10639

In general I agree with @MichaelLange approach. But perhaps separate table is too much... You can have additional column completed in your import table, which if set to "false" then the record belongs to file which is being processing now (or failed processing). After you've processed the file you issue a simple update for this table (should not fail as you don't have any constraints on this column):

update import_table set completed = true where file_name = "file001_chunk1.txt"

Before processing a file you should remove "stale" records:

delete from import_table where file_name = "file001_chunk1.txt"

This solution would be faster and easier to implement then nested transactions. Perhaps with this approach you will face table locks but with appropriate selection of isolation level this can be minimised. Optionally you may wish to create a view over this table to filter out the non-completed records (enable index on completed column):

create view import_view as select a, b, c from import_table where completed = true

In general I think nested transactions are not possible in this case, as chunks can be processed in parallel threads, each thread holding it's own transaction context. The transaction manager will not be able to start a nested transaction in new thread, even if you somehow manage to create a "main transaction" in "top" job thread.

Yet another approach is the continuation of the "temporary table". What the import process should do is to create import tables and name them according to e.g. date:

import_table_2011_10_01
import_table_2011_10_02
import_table_2011_10_05
...
etc

and a "super-veiw" that joins all these tables:

create view import_table as
select * from import_table_2011_10_01
union
select * from import_table_2011_10_02
union
select * from import_table_2011_10_05

After the import succeeded, the "super-view" should be re-created.

With this approach you will have difficulties with foreign keys for import table.

Yet another approach is to use a separate DB for import and then feed the imported data from the import DB to main (e.g. transfer the binary data).

Upvotes: 4

Michael Pralow

Reputation: 6630

Can't you try it with a compensation strategy ?

some examples

use temporary or extra table for the data, move data to business table, only if job succeeds
use conditional flow to call an "deleteAllWrittenItems" step in case of a problem

Upvotes: 1

spring batch - Partition Step to rollback all the previous chunk commit &#39;s, when chunk fails

Which is the best way among these:

UPDATES:

Answers (3)

Related Questions

spring batch - Partition Step to rollback all the previous chunk commit 's, when chunk fails