Spring Batch-Repeat step for each item in a data list

Question

This is a tough one, but I am sure it is not unheard of.

I have two datasets, Countries and Demographics. The countries dataset contains the name of a country and an ID to it's Demographic data.

The demographic dataset is a hierarchal dataset starting from the country down to the suburb.

Both of these datasets are pulled from a 3rd party on a weekly basis.

I need to split the demographics out into files, one for each country.

So far the steps that i have are 1) Pull Countries 2) Pull Demographics 3) (this is needed) Loop over the country dataset calling a "Write Country Demographics to File"

Is it possible to somehow repeat a step passing the current country id?

EDIT: Added link to sample of PartitionHandler

Thanks JBristow. The below link shows the use of overriding the PartitionHandler to pass parameters using the addArgument of a JavaTask object, but it looks like a lot of heavy lifting by the developer and not very "business problem specific" which is the goal of Spring batch. http://www.activeeon.com/blog/all/integration/distribute-a-spring-batch-job-on-the-proactive-scheduler

I also saw in your original link section 7.4.3. Binding Input Data to Steps this is in the context of 7.4.2. Partitioner, this looks very exciting

I don's supose that anyone has some sample XML config of this in play?

Partitioner
Passing dynamic values to steps within the partition

Thanks in advance.

Jon Bristow · Accepted Answer

Yes, check out the partitioning feature of spring-batch! http://static.springsource.org/spring-batch/reference/html-single/index.html#partitioning

Basically, it allows you to use a "partitioner" to create new execution contexts to pass to a handler that then does something with that information.

While partitioning was made for parallelization, its default concurrency is 1, so you can start small and ratchet it up to match the hardware at your disposal. Since I assume that each country's data is not dependent on the others (at least in the download demographics step), your job could make use of basic parallelization.

/EDIT: Adding example.

Here's what I do (more or less): First, the XML:

And now some Java:

public class MyPartitioner implements Partitioner {
  @Override 
  public Map partition(int gridSize) {
    List list = getValuesToRunOver();
    /* I use treemap because my partitions are ordered, hashmap should work if order isn't important */
    Map out = new TreeMap(); 
    for (String item : list) {
      ExecutionContext context = new ExecutionContext();
      context.put("key", "value"); // add your own stuff!
      out.put("innerStep"+item, context);
    }
    return out;
  }
}

Then you just read from the context like you would from a normal step or job context inside your step.

Spring Batch-Repeat step for each item in a data list

Answers (1)

Related Questions