Jeremy Lewi
Jeremy Lewi

Reputation: 6776

Airflow not loading dags in /usr/local/airflow/dags

Airflow seems to be skipping the dags I added to /usr/local/airflow/dags.

When I run

airflow list_dags

The output shows

[2017-08-06 17:03:47,220] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow/dags


-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
example_bash_operator
example_branch_dop_operator_v3
example_branch_operator
example_http_operator
example_passing_params_via_test_command
example_python_operator
example_short_circuit_operator
example_skip_dag
example_subdag_operator
example_subdag_operator.section-1
example_subdag_operator.section-2
example_trigger_controller_dag
example_trigger_target_dag
example_xcom
latest_only
latest_only_with_trigger
test_utils
tutorial

But this doesn't include the dags in /usr/local/airflow/dags

ls -la /usr/local/airflow/dags/
total 20
drwxr-xr-x 3 airflow airflow 4096 Aug  6 17:08 .
drwxr-xr-x 4 airflow airflow 4096 Aug  6 16:57 ..
-rw-r--r-- 1 airflow airflow 1645 Aug  6 17:03 custom_example_bash_operator.py
drwxr-xr-x 2 airflow airflow 4096 Aug  6 17:08 __pycache__

Is there some other condition that neededs to be satisfied for airflow to identify a DAG and load it?

Upvotes: 36

Views: 73622

Answers (15)

jo-L
jo-L

Reputation: 36

It is common for dags not to appear due to the dag_discovery_safe_mode airflow configuration.

"If enabled, Airflow will only scan files containing both DAG and airflow (case-insensitive)."

Solution

Adding from airflow import DAG to your dag file (even if you don't need to use the DAG object) ensures airflow will recognize the job.

Walkthrough of airflow source code

I'm not an airflow developer, but here is a brief walkthrough for how the safe mode works in airflow's 2.10.4 github

  1. The path to the directory for your dags is collected from your settings (link)
  2. The path is passed to the DagBag class's __init__ method (link)
  3. Inside the __init__ method, the safe_mode config setting is determined (link)
  4. The path to the dag directory and the safe_mode setting are passed to the collect_dags method (link)
  5. The dags path and the safe mode setting are passed to a helper function (link), which passes it to another helper function (link), which passes it to another helper function (link), which then loads the might_contain_dag_callable (link), which defaults to reading the text of each file and applying the following logic for the words "airflow" and "dag" (link)

If there are any airflow devs reading this, does the whole thing about loading might_contain_dag_callable from the conf mean I can define a custom function somewhere to overwrite this logic? I've searched in the docs and I can't find anything...

Upvotes: 0

Surya A
Surya A

Reputation: 11

Following points should resolve the issue of sample dag vs main dags.

  1. edit airflow.cfg set load_examples = False.

  2. Verify if the dags_folder points to your dags folder.

  3. restart Webserver and Scheduler.

Note: dag_dir_list_interval in airflow.cfg decides, How often (in seconds) to scan the DAGs directory for new files. So that your newly added dags appear in the 'DAGS' list on the UI.

Upvotes: 0

Bowrna
Bowrna

Reputation: 111

I had issue in loading the dynamic DAG alone and found that in Airflow version 2.4.2, if you modify the attributes like on_*_callback for dynamic DAG then it resulted in error to occur. Please check this Github issues for more details.

https://github.com/apache/airflow/issues/30012

This is fixed in 2.5.1 Airflow version

Upvotes: 0

Rupesh Bansal
Rupesh Bansal

Reputation: 351

Try airflow db init before listing the dags. This is because airflow list_dags lists down all the dags present in the database (And not in the folder you mentioned). Airflow initdb will create entry for these dags in the database.

Make sure you have environment variable AIRFLOW_HOME set to /usr/local/airflow. If this variable is not set, airflow looks for dags in the home airflow folder, which might not be existing in your case.

Upvotes: 35

Galuoises
Galuoises

Reputation: 3283

You need to set airflow first and initialise the db

export AIRFLOW_HOME=/myfolder
mkdir /myfolder/dags
airflow db init

You need to create a user too

 airflow users create \
          --username admin \
          --firstname FIRST_NAME \
          --lastname LAST_NAME \
          --role Admin \
          --email [email protected]

If you have done it correctly you should see airflow.cfg in your folder. There you will find dags_folder which shows the dags folder.

If you have saved your dag inside this folder you should see it in the dag lists

airflow dags list

, or using the UI with

airflow webserver --port 8080

Otherwise, run again airflow db init.

Upvotes: 2

cornandme
cornandme

Reputation: 47

In my case, print(something) in dag file prevented printing dag list on command line.

Check if there is print line in your dag if above solutions are not working.

Upvotes: 0

Surya Venkatapathi
Surya Venkatapathi

Reputation: 336

It will be the case if airflow.cfg config is pointed to an incorrect path.

STEP 1: Go to {basepath}/src/config/

STEP 2: Open airflow.cfg file

STEP 3: Check the path it should point to the dags folder you have created

dags_folder = /usr/local/airflow/dags

Upvotes: 8

Yogesh Awdhut Gadade
Yogesh Awdhut Gadade

Reputation: 2708

There can be two issues: 1. Check the Dag name given at the time of DAG object creation in the DAG python program

dag = DAG(
dag_id='Name_Of_Your_DAG', 
....)

Note that many of the times the name given may be the same as the already present name in the list of DAGs (since if you copied the DAG code). If this is not the case then 2. Check the path set to the DAG folder in Airflow's config file. You can create DAG file anywhere on your system but you need to set the path to that DAG folder/directory in Airflow's config file.

For example, I have created my DAG folder in the Home directory then I have to edit airflow.cfg file using the following commands in the terminal:

creating a DAG folder at home or root directory

$mkdir ~/DAG

Editing airflow.cfg present in the airflow directory where I have installed the airflow

 ~/$cd airflow
 ~/airflow$nano airflow.cfg

In this file change dags_folder path to DAG folder that we have created.

If you still facing the problem then reinstall the Airflow and refer this link for the installation of Apache Airflow.

Upvotes: 2

Hutch
Hutch

Reputation: 501

I find that I have to restart the scheduler for the UI to pick up the new dags, When I make changes to a dag in my dags folder. I find that when I update the dags they appear in the list when I run airflow list_dags just not in the UI until I restart the scheduler.

First try running:

airflow scheduler

Upvotes: 4

AC at CA
AC at CA

Reputation: 735

The example files are not in /usr/local/airflow/dags. You can simply mute them by edit airflow.cfg (usually in ~/airflow). set load_examples = False in 'core' section.

There are couple of errors may make your DAG not been listed in list_dags.

  1. Your DAG file has syntax issue. To check this, just run python custom_example_bash_operator.py and see if any issue.
  2. See if the folder is the default dag loading path. For a new bird, I suggest that just create a new .py file and copy the sample from here https://airflow.incubator.apache.org/tutorial.html then see if the testing dag shows up.
  3. Make sure there is dag = DAG('dag_name', default_args=default_args) in the dag file.

Upvotes: 12

Bolke de Bruin
Bolke de Bruin

Reputation: 750

Can you share what is in custom_example_bash_operator.py? Airflow scans for certain magic inside a file to determine whether is a DAG or not. It scans for airflow and for DAG.

In addition if you are using a duplicate dag_id for a DAG it will be overwritten. As you seem to be deriving from the example bash operator did you keep the name of the DAG example_bash_operator maybe? Try renaming that.

Upvotes: 1

SMDC
SMDC

Reputation: 717

Are your

custom_example_bash_operator.py

has a DAG name different from the others? If yes, try restart the scheduler or even resetdb. I usually mistook the filename to be the dag name as well, so better to name them the same.

Upvotes: 1

Vamsi
Vamsi

Reputation: 52

Try Restarting the scheduler. Scheduler needs to be restarted when new DAGS need to be added to the DAG Bag

Upvotes: -9

Neil
Neil

Reputation: 849

dag = DAG(
    dag_id='example_bash_operator', 
    default_args=args,
    schedule_interval='0 0 * * *',
    dagrun_timeout=timedelta(minutes=60))

When a DAG is instantiated it pops up by the name you specify in the dag_id attribute. dag_id serves as a unique identifier for your DAG

Upvotes: 7

Jeremy Lewi
Jeremy Lewi

Reputation: 6776

My dag is being loaded but I had the name of the DAG wrong. I was expecting the dag to be named by the file but the name is determined by the first argument to the DAG constructor

dag = DAG(
    'tutorial', default_args=default_args, schedule_interval=timedelta(1))

Upvotes: 41

Related Questions