I need to execute seven distinctive processes sequently(One after the other). The data is stored in Mysql. I am thinking of the following options, Please correct me if I am wrong, or if there is a better solution. Requirments: Read the data from the Db, do the seven processes(datavalidation, calculation1, calculation2 ...etc.) finally, write the processed data into the DB. Need to process the data in chunks. My solution and issues: Data read: Read the data using JdbcCursorItemReader, because this is the best performing db reader - But, the SQL is very complex , so I may have to consider a custom ItemReader using JdbcTemplate? which gives me more flexibility in handling the data. Process: Define seven steps and chunks, share the data between the steps using databean. But, this won't be a good idea, because the data processes in chunks and after each chunk the step1 writer will create a new set of data in the databean. When this databean shared across the other steps, data integrity will be an issue. Use StepExecutionContext to share the data between steps. But this may affect the performance as this involves Batch job repository. Define only one step, with one ItemReader, and a chain of processes (the seven processes), and create one ItemWriter which writes the processed data into the DB. But, I won't be able to administrate or monitor each different processes, all will be in one step.

Reputation: 567

spring batch - processor chain

I need to execute seven distinctive processes sequently(One after the other). The data is stored in Mysql. I am thinking of the following options, Please correct me if I am wrong, or if there is a better solution.

Requirments:

Read the data from the Db, do the seven processes(datavalidation, calculation1, calculation2 ...etc.) finally, write the processed data into the DB.
Need to process the data in chunks.

My solution and issues: Data read:

Read the data using JdbcCursorItemReader, because this is the best performing db reader - But, the SQL is very complex , so I may have to consider a custom ItemReader using JdbcTemplate? which gives me more flexibility in handling the data.

Process:

Define seven steps and chunks, share the data between the steps using databean. But, this won't be a good idea, because the data processes in chunks and after each chunk the step1 writer will create a new set of data in the databean. When this databean shared across the other steps, data integrity will be an issue.
Use StepExecutionContext to share the data between steps. But this may affect the performance as this involves Batch job repository.
Define only one step, with one ItemReader, and a chain of processes (the seven processes), and create one ItemWriter which writes the processed data into the DB. But, I won't be able to administrate or monitor each different processes, all will be in one step.

Upvotes: 1

Answers (2)

incomplete-co.de

Reputation: 2137

the org.springframework.batch.item.support.CompositeItemProcessor is an out of the box component from the Spring Batch Framework that would support your requirement akin to your second option. this would allow you do to the following; - keep separation in your design/solution for reading from the database (itemreader) - keep separation of each individual processors 'concerns' and configuration - allow any individual processor to 'shutdown' the chunk by returning null, irrespective of previous processes

the CompositeItemProcessor iterates over a loop of delegates, so it's 'similar' to an action pattern. it's quite useful in the scenario you've described and still allows you to leverage the Chunk benefits (exception, retry, commit policy, etc.)

Upvotes: 5

Cygnusx1

Reputation: 5409

Suggestions:

1) Read the data using JdbcCursorItemReader.

All out-of-the-box Components are a good choice because they already implements the ItemStream interface that make your steps restartable. But like you mention, sometime, the request is just to complexe or, like me, you already have a service or DAO that you can reuse.

I would suggest you use the ItemReaderAdapter. It let you configure a delegate service to call to get your data.

    <bean id="MyReader" class="xxx.adapters.MyItemReaderAdapter">
       <property name="targetObject" ref="AnExistingDao" />
       <property name="targetMethod" value="next" />        
   </bean>

Note that the targetMethod must respect the read contract of ItemReaders (return null when no more data)

If your job does not need to be restartable, you could simply use the class : org.springframework.batch.item.adapter.ItemReaderAdapter

But if you need your job to be restartable, you can create your own ItemReaderAdapter like this:

public class MyItemReaderAdapter<T> extends AbstractMethodInvokingDelegator<T> implements ItemReader<T>, ItemStream {




private long currentCount = 0;

private final String CONTEXT_COUNT_KEY = "count"; 

/**
 * @return return value of the target method.
 */
public T read() throws Exception {

    super.setArguments(new Long[]{currentCount++});
    return invokeDelegateMethod();
}
@Override
public void open(ExecutionContext executionContext)
        throws ItemStreamException {
    currentCount = executionContext.getLong(CONTEXT_COUNT_KEY,0);

}


@Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
    executionContext.putLong(CONTEXT_COUNT_KEY, currentCount);
    log.info("Update Stream current count : " + currentCount);
}


@Override
public void close() throws ItemStreamException {
    // TODO Auto-generated method stub

}

}

Because the out-of-the-box itemReaderAdapter is not restartable, you just create your own that implements the ItemStream

2) Regarding the 7 steps vs 1 step.

I would go with 1 step with compositeProcessor on this one. the 7 steps option will only bring problems IMO.

1) 7 steps databean : so your writer commit in a databean until step 7.. then step 7 writer try to commit to the real database and boom error!!! all is lost and the batch must restart from step 1!!

2) 7 steps with context : could be better since you will have the state saved in the spring batch metadata.. BUT it is not a good practice to store big data in the metadata of springBatch!!

3) is the way to go IMO. ;-)

Upvotes: 1

spring batch - processor chain

Answers (2)

Related Questions