Reputation: 143
I'm working on an Apache Airflow, container based application. My environment is made of the following components:
My understanding of this pattern is that I can have a scheduler and a webserver containers with just the necessary dependencies for Airflow, then I can have a worker node (or several) with everything I need to run my DAG.
When I try to work with it this way (for instance, adding and using a module in the worker node, let's say it's the crypto
module), I get a DAG Import Error
exception in the front end, that says the following:
ModuleNotFoundError: No module named 'crypto'
.
This makes sense to me, because the scheduler knows that I'll need that module for the execution and throws an error, despite this the DAG correctly work, because when it's run, in the worker node, it has all the required dependencies.
How can I fix this?
Thanks
Upvotes: 3
Views: 13536
Reputation: 632
The above answer by @kaxil is good but seems to be incomplete. At least according to the documentation here:
Airflow scheduler executes the code outside the Operator’s execute methods
This means that you can avoid running into this sort of ImportError
if you change top-level imports for local imports inside Python callables. The referred documentation explains it in more details.
Upvotes: 0
Reputation: 18824
Currently, you will need to sync your dependencies on both Scheduler and Worker.
The scheduler parses DAG Files in a separate process (one per DAG file), so if your dependencies used in DAG file are not installed in Scheduler it will add an ImportError
in DB which will be then shown in the Webserver.
Upvotes: 2