arbazkhan002
arbazkhan002

Reputation: 1333

Debugging Broken DAGs

When the airflow webserver shows up errors like Broken DAG: [<path/to/dag>] <error>, how and where can we find the full stacktrace for these exceptions?

I tried these locations:

/var/log/airflow/webserver -- had no logs in the timeframe of execution, other logs were in binary and decoding with strings gave no useful information.

/var/log/airflow/scheduler -- had some logs but were in binary form, tried to read them and looked to be mostly sqlalchemy logs probably for airflow's database.

/var/log/airflow/worker -- shows up the logs for running DAGs, (same as the ones you see on the airflow page)

and then also under /var/log/airflow/rotated -- couldn't find the stacktrace I was looking for.

I am using airflow v1.7.1.3

Upvotes: 33

Views: 53617

Answers (7)

gogstad
gogstad

Reputation: 3739

you're looking for airflow dags list-import-errors

https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#list-import-errors

Upvotes: 2

kabirbaidhya
kabirbaidhya

Reputation: 3490

The accepted answer works in almost all cases to validate DAGs and debug errors if any.

However, if you are using docker-compose to run airflow, you should do this:

docker-compose exec airflow airflow list_dags

It runs the same command inside the running container.

Upvotes: 0

mcmhav
mcmhav

Reputation: 1023

Getting reported Broken DAG-error or other importErrors can also be done through the airflow-API

Example with curl:

# Assuming the airflow instance is running locally on port 8080, with 
# Airflow version 2.2.2
# AIRFLOW__API__AUTH_BACKEND=airflow.api.auth.backend.basic_auth
# user: admin
# password: test
# 
# Piping trough `jq` extracting only the stack_trace

curl -s \
  -H 'Content-Type: application/json' \
  --user "admin:test" \
  "http://localhost:8080/api/v1/importErrors" \
    | jq -r '.import_errors[].stack_trace'

Might need to enable access to the API

Upvotes: 1

Shakeel
Shakeel

Reputation: 2015

I try below following step by step

  • airflow list_dags as mentioned by @Babcool this will list stacktrace

If you are still not able to figure out issue you run task manually and see direct errors.

pre-step set you environment variables

export AIRFLOW_HOME="/airflow/project/path"
export PYTHONPATH="/airflow/project/path"

Running dag

airflow run dag_id task_id 2020-1-11

source:

If still things aren't clear you can try running code line by line in python console and check exact issue (after activating your virtual environment)

For example:

(venv) shakeel@workstation:~$ python
Python 3.7.9 (default, Aug 18 2020, 06:24:24) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from my_package.my_module import MyClass
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'my_package'
>>>

Upvotes: 0

belka
belka

Reputation: 1530

What you want to do is access the inner logs of the webserver so that you get the full stacktrace. My Airflow server is being executed in a Docker image so I'll use Docker to fetch these logs but the idea remains.

  1. docker ps
  2. fetch the PID of the webserver
  3. docker logs [PID]
  4. read the full logs of the given Airflow webserver.

This should contain the exact information of why your DAG build failed.

Upvotes: 3

Chengzhi
Chengzhi

Reputation: 2591

If you want to compile and see any syntax error, you can also try python your_dag.py

Upvotes: 12

Babcool
Babcool

Reputation: 2193

Usually I used the command airflow list_dags which print the full stacktrace for python error found in dags.

That will work with almost any airflow command as airflow parse dags folder each time you use a airflow CLI command.

Upvotes: 32

Related Questions