Sabir Khan
Sabir Khan

Reputation: 10142

Extend partitioning to one more level

As per below image from Spring Batch Doc, A master step is getting partitioned into Six Slave steps which are identical copies of master.

enter image description here

My question is, can I extend partitioning to one more level or N more levels? i.e. All of six slaves becomes a master for further N slaves?

Use Case: First we partition data on major criteria then we further partition data on some other criteria with in that major criteria.

e.g. first I launch slaves for data of N clients based on client name then for each client name, partition data further based on office locations.

Can this be done or not supported?

EDIT: As per my coding experiments, it doesn't look doable due to StepExecutionContext issues. See this and this. We can't pass StepExecutionContext from one Step to another in partitioning context.

Upvotes: 2

Views: 151

Answers (1)

Asoub
Asoub

Reputation: 2371

(I should have asked more details, so my answer might be unecessarily long, depending on what you understand from partitionning in spring-batch. I'll keep this thread if I need to use partitionning).

You could always spawn your own threads from your slaves and give them the parameters they need, but that would completely defeat the whole point of using a framework like spring-batch.

This is not a direct solution to your problem: slaves don't spawn other slaver here. And in fact, and I don't think they should/can. But the Partitionner you make will simulate this behavior by giving each slaves it's own parameters (clientName and officeLocation) in their ExecutionContext, so they'll read/process/write their own part.


If you didn't understood:

I'm using this as an example: https://www.mkyong.com/spring-batch/spring-batch-partitioning-example/ so you'll need to read to know what I'm sayin.

From what I understand of partitionning, each step will have its own ExecutionContext, and in this context you'll put what parameter is specific to each slaves. You'll need to create a Partitionner that sets the specific values for each slave from the gridSize.

In Mkyong's example, he sets the value of gridSize to 10, which mean he'll have 10 threads. He knows he will go from 1 to 100, so he sets for each threads the database value matching:

for `thread1`, fromId:1 toId:10, 
for `thread2`, fromId:11 toId:20, 
for `thread3`, fromId:21 toId:30, 
etc.

He sets these values in the ExecutionContext, so each reader/processor/writer gets their own value for their processing (the selects goes from fromIdto toId, so each selects gets it's own part). If he wanted, he could have done something more dynamic: set the total number of ids of the database in the Partitionner and change the toId and fromId depending on the size. It's highly customisable.

For your case, it's like you have to take care of two parameters (here it was just the id), and they're not numbers. Let's say you only had clientName, if you give your Partitionner a list (or array) of these client name, you'll just have to spawn one ExecutionContext per clientName and set it in. If you have two parameters, you could use a more complex structure, like a List of Client class (and of course, each Client have a clientName parameter and a list of String officeLocations). Now, you'll create one ExecutionContext per clientName per officeLocation.Each reader will get from its ExecutionContext what clientName and officeLocation to select them.

For example, if you have 3 clients and each has 2 locations, you'll end up with 6 ExecutionContext (and so, 6 slaves/threads). Then, on your Reader you'll just have to retreive the clientName and officeName from the ExecutionContext and use them to select your entities (from DB or whatever).

Creating the list of clients with their officeName could be done in a previous step and set in the job context to be accesible in the whole job. If the gridSize needs to be the same as the number of threads created by spring batch, you can calculate it at the same moment you create your list of clients and set it the same way.

Upvotes: 1

Related Questions