Reputation: 71

Clarification on Step Chaining

Because of many different opinions on Step Chaining in Spring Batch, depending on the use case, I want to know what is the most common sense:

Chaining of Steps, i.e. a Job has a flow of Steps, where every Step has Reader, Processer & Writer. Data between Steps is exchanged using the Job ExecutionContext.

Chaining of ItemProcessors, i.e. a job only has one step and but a flow of ItemProcessors.

The 1st possibility is the more reasonable in my opinion, as the name 'Job' implies that there are several Steps to finish it. The downside in many use cases could be, that there will be redundant or sometimes 'empty' reading & writing at start and end of a step. The 2nd one is the most common solution, but I think this 'one step' solution isn't quite what batch processing is intended for.

What's your opinion on this?

Upvotes: 1

Answers (3)

Nathan Hughes

Reputation: 96454

ItemProcessors' usefulness is pretty limited, they're best for cases where you want to transform each item that you read in. You can use them to filter out lines you don't want, but in some cases (when your reader executes a SQL query) that becomes wasteful fast, it's a lot more efficient if you can avoid having to read those lines in the first place.

It's nice to have a hook in the process to be able to drop in ItemProcessors, but I wouldn't overuse it. Most non-trivial jobs seem to have multiple steps, and the framework provides support for steps with error-handling, chunking, partitioning, etc., where ItemProcessors compared to steps are extremely lightweight and the framework doesn't provide any support for them beyond providing a place for them in the workflow.

(The statement "Data between Steps is exchanged using the Job ExecutionContext" seems questionable. I've used it to hold things like counts of the number of lines read or written. It's not a good place to put anything much bigger than that.)

Upvotes: 1

Hansjoerg Wingeier

Reputation: 4454

I completly agree with the answers given by Nathan and lexicore.

But there is one remark, I would like to add. I never exchange businessdata using the JobExecutionContext.

If I write a job that has several steps, then every step writes its businessdata into a file and the next steps reads it from there.

Moreover, in the company I'm working with, we have defined the STEPP pattern, which almost all of our batches follow.

STEPP stands for

SELECT -> select data,e.g. from a db
TRANSFORM / FILTER -> transform in a more convenient structure and/or filter
ENRICH -> if necesssary add additional data necessary for business logic that is cheaper to load if not done in the selection phase
PROCESS -> apply the businesslogic
PERSIST -> persist

Not every job has all mentioned phases. E.g., most of them don't have an enrich phase. Some just have a SELECT, TRANSFORM and a PERSIST step.

Often, the different phases are implemented as one step, which stores the data in a file that is read by the step that follows. Sometimes, the whole job is just a single step. Sometimes, a phase consists of several steps. It always depends on the size of the job.

We also use an appropriate naming, so that the different phases are clearly identifiable. For instance, our package are named com.xy._1_select, com.xy._2_transform, etc. Using the number in the package names gives them directly the right order in your IDE's project/package viewer.

Upvotes: 1

lexicore

Reputation: 43709

From http://docs.spring.io/spring-batch/reference/html/domain.html#domain:

A Job has one to many steps, which has exactly one ItemReader, ItemProcessor, and ItemWriter.

So the Spring philosophy is chaining of steps.

Upvotes: 1

Clarification on Step Chaining

Answers (3)

Related Questions