Dotan
Dotan

Reputation: 7632

Airflow: dag_id could not be found

I'm running an airflow server and worker on different AWS machines. I've synced that dags folder between them, ran airflow initdb on both, and checked that the dag_id's are the same when I run airflow list_tasks <dag_id>

When I run the scheduler and worker, I get this error on the worker:

airflow.exceptions.AirflowException: dag_id could not be found: . Either the dag did not exist or it failed to parse. [...] Command ...--local -sd /home/ubuntu/airflow/dags/airflow_tutorial.py'

What seems to be the problem is that the path there is wrong (/home/ubuntu/airflow/dags/airflow_tutorial.py) since the correct path is /home/hadoop/...

On the server machine the path is with ubuntu, but on both config files it's simply ~/airflow/...

What makes the worker look in this path and not the correct one?

How do I tell it to look in it's own home dir?

edit:

Upvotes: 19

Views: 33563

Answers (4)

lesleyww
lesleyww

Reputation: 11

  1. modify dags_folder in "airflow.cfg" to the dir you want
  2. using para "-sd" while listing your airflow tasks, like "airflow list_tasks yourDagName -sd /opt/airflow/dags"

both worked

Upvotes: 1

gcbenison
gcbenison

Reputation: 11963

I'm experiencing the same thing - the worker process appears to pass an --sd argument corresponding to the dags folder on the scheduler machine, not on the worker machine (even if dags_folder is set correctly in the airflow config file on the worker). In my case I was able to get things working by creating a symlink on the scheduler host such that dags_folder can be set to the same value. (In your example, this would mean creating a symlink /home/hadoop -> /home/ubuntu on the scheduler machine, and then settings dags_folder to /home/hadoop). So, this is not really an answer to the problem but it is a viable workaround in some cases.

Upvotes: 6

Michael Spector
Michael Spector

Reputation: 37004

Adding --raw parameter to the airflow run command helped me to see what was the original exception. In my case, the metadata database instance was too slow, and loading dags failed because of a timeout. I've fixed it by:

  • Upgrading database instance
  • Increasing parameter dagbag_import_timeout in airflow.cfg

Hope this helps!

Upvotes: 20

Priyank Mehta
Priyank Mehta

Reputation: 2513

Have you tried setting the dags_folder parameter in config file to point explicitly to the /home/hadoop/ i.e. the desired path?

This parameter controls the location to look for dags

Upvotes: 2

Related Questions