Marco Miduri
Marco Miduri

Reputation: 143

How to avoid DAG Import Errors in Apache Airflow for worker node dependencies?

I'm working on an Apache Airflow, container based application. My environment is made of the following components:

My understanding of this pattern is that I can have a scheduler and a webserver containers with just the necessary dependencies for Airflow, then I can have a worker node (or several) with everything I need to run my DAG.

When I try to work with it this way (for instance, adding and using a module in the worker node, let's say it's the crypto module), I get a DAG Import Error exception in the front end, that says the following: ModuleNotFoundError: No module named 'crypto'.

This makes sense to me, because the scheduler knows that I'll need that module for the execution and throws an error, despite this the DAG correctly work, because when it's run, in the worker node, it has all the required dependencies.

How can I fix this?

Thanks

Upvotes: 3

Views: 13536

Answers (2)

Luiz Tauffer
Luiz Tauffer

Reputation: 632

The above answer by @kaxil is good but seems to be incomplete. At least according to the documentation here:

Airflow scheduler executes the code outside the Operator’s execute methods

This means that you can avoid running into this sort of ImportError if you change top-level imports for local imports inside Python callables. The referred documentation explains it in more details.

Upvotes: 0

kaxil
kaxil

Reputation: 18824

Currently, you will need to sync your dependencies on both Scheduler and Worker.

The scheduler parses DAG Files in a separate process (one per DAG file), so if your dependencies used in DAG file are not installed in Scheduler it will add an ImportError in DB which will be then shown in the Webserver.

enter image description here

Upvotes: 2

Related Questions