Reputation: 1
I have created a pipeline in ADF which executes a workflow inside databricks but I need to restart the pipeline on failure in such a way that my workflow restarts from the failed task only. Any ideas on how to achieve this?
I tried calling the POST API used for executing workflow again after the failure activity but that isn't working
Upvotes: 0
Views: 590
Reputation: 11529
You can use Repair a job run
REST API to re-run the workflow job from the failed task.
https://<databricks instance>.azuredatabricks.net/api/2.1/jobs/runs/repair
After the failure of the workflow, if you are re-running the task first time, there is no need to pass the latest_repair_id
in the body of the POST request. For the next re-run, you need to pass the latest_repair_id
from the previous re-run POST request.
Go through the below demo. This is my workflow job which was failed at task2.
Use the web activity like below for the first time.
{"run_id":<Failed job run id>,"rerun_tasks":["<Failed task name1>","<Failed task name2>"]}
Pass the repair_id
returned by the above web activity @activity('Web rerun first time').output.repair_id
when re-running next time. Here, I have stored it in an integer variable and passed to the next web activity.
@json(concat('{"run_id":<Failed job run id>,"rerun_tasks":["<Failed task name1>","<Failed task name2>"],"latest_repair_id":',string(variables('latest_repair_id')),'}'))
When I debug it, my first web activity got succeeded but the second one failed and you can see the reason for the failure.
So, if you are re-running the same job for multiple times from ADF, make sure to add a web activity with a duration more than the execution time of the failed tasks so that the next repair waits till the previous repair completed.
You can see that the number of attempts (original job run + first repair run) are 2(second web activity failed).
Upvotes: 0