Reputation: 11
I am running Airflow in a separate virtual env and running a couple of data quality DAGs with specific requirements. Effectively I want to run the DAGs in their own virtual envs rather than cluttering the base Airflow environment.
pythonVirtualenvOperator does somewhat similar but it creates its own environments every time and later removes them. A DAG that is run a couple of times a day is not efficient in time or space-wise.
I couldn't find a way to run the DAGs in a separate virtual environment in the same Airflow installation. Is there any way to do it
Upvotes: 0
Views: 2720
Reputation: 956
What I do now is something like in this answer.
I basically have different conda-environments and explicitly call them using the Bash-Operator. Here is an example:
parse_my_files = BashOperator(
task_id='parse-files',
bash_command=f"{path_to_python} {abs_path_code}/my_repository/scripts/"
f"report_processing/pipelines/parse.py",
env={"PATH": os.environ["PATH"],
"DB_URL": db_url}
)
To install current packages, you need to activate the environment and run your package dependency resolver. For us, this is done using poetry
install_dependencies = BashOperator(
task_id=f"install-dependencies-{folder}",
bash_command=f"cd {abs_path_code}/{folder}; conda run -n {env_name} poetry install "
)
It would be nice to have a python-operator that takes an environment and reuses it every time, but as I understand this is not on their to do list.
Upvotes: 2
Reputation: 886
@Saradindu Sengupta
Have you considered utilising the DockerOperator()
? Official Airflow Docker Reference
You could build an image with your specific requirements and execute via the DockerOperator.
Upvotes: 0