Reputation: 3416
I didn't find an answer to my question so far so I'm giving it a try here.
Let's assume a Spring Batch application with remote partitioning. There's one master/manager application partitioning the dataset and sending it to Kafka (to multiple partitions) and worker nodes are consuming from different Kafka partitions to run in parallel. So far so good.
The question is, what happens if while the workers are still processing the data and doing their own things, the manager application suddenly crashes.
The obvious answer is that the partitioning job execution will stay as STARTED
even though the respective worker jobs have the state COMPLETED
.
How can I restart the master node without doing the partitioning again and triggering the workers? The only thing I'd want in this case is to mark that particular job execution as COMPLETED
since all the worker steps have completed.
I tried restarting the job with the JobOperator
interface but obviously it fails since the job is in STARTED
state and not in FAILED
.
Caused by: org.springframework.batch.core.UnexpectedJobExecutionException: Illegal state (only happens on a race condition): job execution already running with name=partitioningJob and parameters={}
Any suggestions are welcome. Thanks!
Upvotes: 2
Views: 593
Reputation: 31600
You can change the status of the job execution from STARTED
to FAILED
and set its END_TIME
to a non null
value before restarting the same job instance (you might need to do that for the step execution as well if needed). On restart, the manager should notice that all workers have completed and will complete the execution.
Upvotes: 0