Reputation: 314
I need a way to kill the running KubernetesPodOperator
task after timeout, my DAG
is scheduled to run every 15 minutes
.
I tried to add dagrun_timeout
and max_active_runs
to the DAG arguments,
I expected this to stop the DAG kill the running tasks and mark them as fail
but what actually happened is that the DAG is marked as failed and the tasks will continue running, and because the DAG is scheduled to run every 15 minutes, the DAG will get triggered and continue eventhough the task from the previous DAG RUN is still running
is there a way to solve this?
Upvotes: 1
Views: 7657
Reputation: 2094
We are still seeing this issue unfortunately. Another potential workaround could be to specify activeDeadlineSeconds for the pod itself:
https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodSpec.md#properties
Upvotes: 0
Reputation: 21
The workaround I see is, when the operator is timeout then have fallback option to kill all the specific running pods.
Upvotes: 0
Reputation: 4051
As we discussed in the comment section, I am summarising out discussion as an answer in order to further help the community.
According to the documentation, the parameter dagrun_timeout specifies how long a DagRun should be up before timing out / failing, so that new DagRuns can be created. In addition, The timeout is only enforced for scheduled DagRuns, and only once the number of active DagRuns == max_active_runs.
As per today,there is an ongoing issue with Airflow 1.10.2,the issue reported description is : Related to this, when we manually fail a task, the DAG task stops running, but the Pod in the DAG does not get killed and continues running. This description matches yours. Although, the current version in Google Cloud Platform are Airflow 1.10.6 and Composer composer-1.10.5-airflow-1.10.6. Thus, for this reason, I strongly encourage you to update your environment.
Upvotes: 1
Reputation: 126
I think the configuration you have used is at DAG level to timeout and mark DAG as failed. I would recommend to use TASK level timeout refer here
execution_timeout (datetime.timedelta) – max time allowed for the execution of this task instance, if it goes beyond it will raise and fail.
Let me know this is helpful !
Upvotes: 0