Reputation: 1333
When the airflow webserver shows up errors like Broken DAG: [<path/to/dag>] <error>
, how and where can we find the full stacktrace for these exceptions?
I tried these locations:
/var/log/airflow/webserver
-- had no logs in the timeframe of execution, other logs were in binary and decoding with strings
gave no useful information.
/var/log/airflow/scheduler
-- had some logs but were in binary form, tried to read them and looked to be mostly sqlalchemy logs probably for airflow's database.
/var/log/airflow/worker
-- shows up the logs for running DAGs, (same as the ones you see on the airflow page)
and then also under /var/log/airflow/rotated
-- couldn't find the stacktrace I was looking for.
I am using airflow v1.7.1.3
Upvotes: 33
Views: 53617
Reputation: 3490
The accepted answer works in almost all cases to validate DAGs and debug errors if any.
However, if you are using docker-compose
to run airflow, you should do this:
docker-compose exec airflow airflow list_dags
It runs the same command inside the running container.
Upvotes: 0
Reputation: 1023
Getting reported Broken DAG
-error or other importErrors
can also be done through the airflow-API
Example with curl
:
# Assuming the airflow instance is running locally on port 8080, with
# Airflow version 2.2.2
# AIRFLOW__API__AUTH_BACKEND=airflow.api.auth.backend.basic_auth
# user: admin
# password: test
#
# Piping trough `jq` extracting only the stack_trace
curl -s \
-H 'Content-Type: application/json' \
--user "admin:test" \
"http://localhost:8080/api/v1/importErrors" \
| jq -r '.import_errors[].stack_trace'
Might need to enable access to the API
Upvotes: 1
Reputation: 2015
I try below following step by step
airflow list_dags
as mentioned by @Babcool this will list stacktraceIf you are still not able to figure out issue you run task manually and see direct errors.
pre-step set you environment variables
export AIRFLOW_HOME="/airflow/project/path"
export PYTHONPATH="/airflow/project/path"
Running dag
airflow run dag_id task_id 2020-1-11
source:
If still things aren't clear you can try running code line by line in python console and check exact issue (after activating your virtual environment)
For example:
(venv) shakeel@workstation:~$ python
Python 3.7.9 (default, Aug 18 2020, 06:24:24)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from my_package.my_module import MyClass
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'my_package'
>>>
Upvotes: 0
Reputation: 1530
What you want to do is access the inner logs of the webserver so that you get the full stacktrace. My Airflow server is being executed in a Docker image so I'll use Docker to fetch these logs but the idea remains.
docker ps
docker logs [PID]
This should contain the exact information of why your DAG build failed.
Upvotes: 3
Reputation: 2591
If you want to compile and see any syntax error, you can also try python your_dag.py
Upvotes: 12
Reputation: 2193
Usually I used the command airflow list_dags
which print the full stacktrace for python error found in dags.
That will work with almost any airflow command as airflow parse dags folder each time you use a airflow CLI command.
Upvotes: 32