Mathias Andersen
Mathias Andersen

Reputation: 526

Debugging Airflow Tasks with IDE tools?

My Airflow DAGs mainly consist of PythonOperators, and I would like to use my Python IDEs debug tools to develop python "inside" airflow. - I rely on Airflow's database connectors, which I think would be ugly to move "out" of airflow for development.

I have been using Airflow for a bit, and have so far only achieved development and debugging via the CLI. Which is starting to get tiresome.

Does anyone know of a nice way to set up PyCharm, or another IDE, that enables me to use the IDE's debug toolset when running airflow test ..?

Upvotes: 16

Views: 16147

Answers (5)

montty
montty

Reputation: 21

If you use docker compose and Airflow, the Python Debug Server works the same. Start the containers regularly, create the run configuration and install required package in the docker container (e.g. webserver/scheduler).
The IDE host name that worked for me was host.docker.internal with any unused port. I also connect to the container and run the DAGs like this:

python dags/your_dag.py

DAG file:

dag = ... # generate or create your dag
if __name__ == "__main__":
  import pydevd_pycharm
  pydevd_pycharm.settrace('host.docker.internal', port=9673, stdoutToServer=True, stderrToServer=True)
  dag.test()

Don't forget to create path mappings in the run configuration.

Upvotes: 1

Dan Frank
Dan Frank

Reputation: 1927

For VSCode, the following debug configuration attaches the builtin debugger

    {
        "name": "Airflow Test - Example",
        "type": "python",
        "request": "launch",
        "program": "`pyenv which airflow`",  // or path to airflow 
        "console": "integratedTerminal",
        "args": [ // exact formulation may depend on airflow 1.0 vs 2.0
            "test",
            "mydag",
            "mytask",
            "`date +%Y-%m-%dT00:00:00`", // current date 
            "-sd",
            "path/to/mydag" // providing the subdirectory makes this faster
        ]
    }

I'd assume there are similar configs that work for other IDEs

Upvotes: 2

Blizz
Blizz

Reputation: 8408

Might be a little late to the party, but been looking for a solution to this as well. Wanted to be able to debug code as close to "production mode" as possible (so nothing with test etc).

Found a solution in the form of the "Python Debug Server". It works the other way around: Your IDE listens and the connection is made from the remote script to your editor.

Just add a new run configuration of type "Python Debug Server". You'll get a screen telling you to pip install pydevd-pycharm remotely. At that same page you can fill in your local IP and a port on which the debugger should be available and optional path mappings.

After that, just add the proposed 2 lines of code to where you want your debug session to start.

Run the configuration to activate the listener and if all is well your editor should break as soon as the location of the settrace-call is reached.

airflow remote debug

Edit/Note: If you stop the configuration in your editor, airflow will continue with the task, be sure to realise that.

Upvotes: 13

brki
brki

Reputation: 2780

I debug airflow test dag_id task_id, run on a vagrant machine, using PyCharm. You should be able to use the same method, even if you're running airflow directly on localhost.

Pycharm's documentation on this subject should show you how to create an appropriate "Python Remote Debug" configuration. When you run this config, it waits to be contacted by the bit of code that you've added someplace (for example in one of your operators). And then you can debug as normal, with breakpoints set in Pycharm.

Upvotes: 1

Mathias Andersen
Mathias Andersen

Reputation: 526

It might be somewhat of a hack, but I found one way to set up PyCharm:

  • Use which airflow to the local airflow environment - which in my case is just a pipenv
  • Add a new run configuration in PyCharm
  • Set the python "Script path" to said airflow script
  • Set Parameters to test a task: test dag_x task_y 2019-11-19

This have only been validated with the SequentialExecutor, which might be important.

It sucks that I have to change test parameters in the run configuration for every new debug/development task, but so far this is pretty useful for setting breakpoints and stepping through code while "inside" the local airflow environment.

Upvotes: 4

Related Questions