Joe
Joe

Reputation: 314

set a timeout for Airflow KubernetesPodOperator task

I need a way to kill the running KubernetesPodOperator task after timeout, my DAG is scheduled to run every 15 minutes.

I tried to add dagrun_timeout and max_active_runs to the DAG arguments,

I expected this to stop the DAG kill the running tasks and mark them as fail

but what actually happened is that the DAG is marked as failed and the tasks will continue running, and because the DAG is scheduled to run every 15 minutes, the DAG will get triggered and continue eventhough the task from the previous DAG RUN is still running

is there a way to solve this?

Upvotes: 1

Views: 7657

Answers (4)

Jesper
Jesper

Reputation: 2094

We are still seeing this issue unfortunately. Another potential workaround could be to specify activeDeadlineSeconds for the pod itself:

https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodSpec.md#properties

Upvotes: 0

VikasReddy Ncs
VikasReddy Ncs

Reputation: 21

The workaround I see is, when the operator is timeout then have fallback option to kill all the specific running pods.

https://airflow.apache.org/docs/apache-airflow-providers-dingding/stable/operators.html#sending-messages-from-a-task-callback

Upvotes: 0

Alexandre Moraes
Alexandre Moraes

Reputation: 4051

As we discussed in the comment section, I am summarising out discussion as an answer in order to further help the community.

According to the documentation, the parameter dagrun_timeout specifies how long a DagRun should be up before timing out / failing, so that new DagRuns can be created. In addition, The timeout is only enforced for scheduled DagRuns, and only once the number of active DagRuns == max_active_runs.

As per today,there is an ongoing issue with Airflow 1.10.2,the issue reported description is : Related to this, when we manually fail a task, the DAG task stops running, but the Pod in the DAG does not get killed and continues running. This description matches yours. Although, the current version in Google Cloud Platform are Airflow 1.10.6 and Composer composer-1.10.5-airflow-1.10.6. Thus, for this reason, I strongly encourage you to update your environment.

Upvotes: 1

Abdul
Abdul

Reputation: 126

I think the configuration you have used is at DAG level to timeout and mark DAG as failed. I would recommend to use TASK level timeout refer here

execution_timeout (datetime.timedelta) – max time allowed for the execution of this task instance, if it goes beyond it will raise and fail.

Let me know this is helpful !

Upvotes: 0

Related Questions