Norio Akagi
Norio Akagi

Reputation: 725

Airflow tasks get stuck at "queued" status and never gets running

I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like:

(start) -> (do_work_for_product1)
     ├  -> (do_work_for_product2)
     ├  -> (do_work_for_product3)
     ├  …

So the start task has multiple downstreams. And I setup concurrency related configuration as below:

parallelism = 3
dag_concurrency = 3
max_active_runs = 1

Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others stuck at "queued" status.

If I clear the task from Admin UI, it gets executed. There is no worker log (after processing some first downstreams, it just doesn't output any log).

Web server's log (not sure worker exiting is related)

/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
  .format(x=modname), ExtDeprecationWarning
[2017-08-24 04:20:56,496] [51] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow_dags
[2017-08-24 04:20:57 +0000] [27] [INFO] Handling signal: ttou
[2017-08-24 04:20:57 +0000] [37] [INFO] Worker exiting (pid: 37)

There is no error log on scheduler, too. And a number of tasks get stuck is changing whenever I try this.

Because I also use Docker I'm wondering if this is related: https://github.com/puckel/docker-airflow/issues/94 But so far, no clue.

Has anyone faced with a similar issue or have some idea what I can investigate for this issue...?

Upvotes: 35

Views: 104494

Answers (8)

Alexander Roberts
Alexander Roberts

Reputation: 11

I'm running Airflow version 2.4.3 and seemed to have got the same issue. But I resolved it by clearing the metadata database airflow db reset - not sure if this is the best solution, but just in case anyone wants a potentially quick way of resolving queued tasks that are not running.

PLEASE NOTE: resetting the metadata will remove all DAG history and previous runs. It also removes any users created as well.

Upvotes: 1

Kermit
Kermit

Reputation: 5982

My tasks were stuck in state:queued because I was hitting concurrency limits.

These are defined within ~/airflow/airflow.cfg

parallelism = 16
max_active_tasks_per_dag = 16
max_active_runs_per_dag = 16

Be aware that each task instance appears to be using 3 x 128 MB RAM on my machine

Upvotes: 0

Vzzarr
Vzzarr

Reputation: 5650

I arrived here after Googling and in my case with MWAA, my Airflow was running with limited resources quite many tasks. I observed several Airflow DAGs in a Queued State, so I thought it was an issue of resources.

Increasing the allocated resources to the Environment Class for my Airflow instance solved the issue: DAGs got unblocked and resumed working.

Upvotes: 2

Shams
Shams

Reputation: 3667

In my case, all Airflow tasks got stuck and none of them were running. Below are the steps I have done to fix it:

  1. Kill all airflow processes, using $ kill -9 <pid>
  2. Kill all celery processes, using $ pkill celery
  3. Increses count for celery's worker_concurrency, parallelism, dag_concurrency configs in airflow.cfg file.
  4. Starting airflow, first check if airflow webserver automatically get started, as in my case, it is running through Gunicorn otherwise start using $ airflow webserver &
  5. Start airflow scheduler $ airflow scheduler
  6. Start airflow worker $ airflow worker
  7. Try to run a job.

Upvotes: 1

Sheng Li
Sheng Li

Reputation: 31

Please try airflow scheduler, airflow worker command.

I think airflow worker calls each task, airflow scheduler calls between two tasks.

Upvotes: 3

Rohan Sawant
Rohan Sawant

Reputation: 31

I have been working on the same docker image puckel. My issue was resolved by :

Replacing

 result_backend = db+postgresql://airflow:airflow@postgres/airflow

with

celery_result_backend = db+postgresql://airflow:airflow@postgres/airflow

which I think is updated in the latest pull by puckel. The change was reverted around in Feb 2018 and your comment was made in January.

Upvotes: 3

Chengzhi
Chengzhi

Reputation: 2591

We have a solution and want to share it here before 1.9 becomes official. Thanks for Bolke de Bruin updates on 1.9. in my situation before 1.9, currently we are using 1.8.1 is to have another DAG running to clear the task in queue state if it stays there for over 30 mins.

Upvotes: 3

Bolke de Bruin
Bolke de Bruin

Reputation: 750

Tasks getting stuck is, most likely, a bug. At the moment (<= 1.9.0alpha1) it can happen when a task cannot even start up on the (remote) worker. This happens for example in the case of an overloaded worker or missing dependencies.

This patch should resolve that issue.

It is worth investigating why your tasks do not get a RUNNING state. Setting itself to this state is first thing a task does. Normally the worker does log before it starts executing and it also reports and errors. You should be able to find entries of this in the task log.

edit: As was mentioned in the comments on the original question in case one example of airflow not being able to run a task is when it cannot write to required locations. This makes it unable to proceed and tasks would get stuck. The patch fixes this by failing the task from the scheduler.

Upvotes: 10

Related Questions