Vortex
Vortex

Reputation: 789

DAG dependencies, lazy loading

In Airflow when Python file gets imported, scheduler reads all of my dependencies. Because my dependencies are slow and calling native libraries, I would like to separate DAG schedule from actual tasks. I am planning to send tasks to the cloud.

What is the right design for Airflow to only import schedule and DAG definition, without importing task dependencies until they are actually used? What are pros, cons?

Upvotes: 1

Views: 1158

Answers (2)

Mattia Gallegati
Mattia Gallegati

Reputation: 11

A little bit late to the party. As Tomasz sayd it depends heavily on your DAG structure, BTW, in order to avoid dependency resolution by the scheduler you can use a design pattern that "hide" your dependencies till execution time. If you are working with PythonOperator or PythonVenvOperator one approach is to move your import inside a support function inside the DAG. https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#:~:text=top&level&python&code Generally speaking if you want to call a Python function inside a whl package you can build a "wrapper function" inside your DAG that will import the dependencies and then call the actual function. From the PythonOperator you will call the "wrapper function" and not the actual target function directly. Hope this helps.

Upvotes: 0

Tomasz Urbaszek
Tomasz Urbaszek

Reputation: 808

It depends heavily on your DAG design.

In general try to void any top level code both logic (reading variables, executing some functions etc) and imports. This can be done by using some form of lazy evaluation.

In case of imports you may try to bake your imports into a function that will be called only during execution (for example by creating custom operator or using python operator).

Upvotes: 2

Related Questions