nolwww
nolwww

Reputation: 1745

Airflow reset environment variable while running bashoperator

With one of my airflow task, I have an environment variable issue.

[2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO -   File "/home/ubuntu/.pyenv/versions/3.6.7/lib/python3.6/os.py", line 669, in __getitem__
[2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO -     raise KeyError(key) from None
[2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - KeyError: 'HOME'
[2019-08-19 04:51:04,639] {{bash_operator.py:131}} INFO - Command exited with return code 1

And my task is the following:

task_name = BashOperator(
    task_id='task_name',
    bash_command="cd path/to/manage.py && export LC_ALL=C.UTF-8 && export LANG=C.UTF-8 "
    f'&& {Variable.get("python_virtualenv_path")}virtual-env-name/bin/python manage.py command_name',
    retries=1,
    pool='LightAndFast',
    dag=dag
)

Any ideas of this issue?

Upvotes: 0

Views: 5199

Answers (2)

SunnyAk
SunnyAk

Reputation: 588

Ideally, you would place all the environmental variables as a file within - /etc/default/

eg. /etc/default/airflow_vars

airflow_vars would contain your environmental variables, stuff like -

AIRFLOW_HOME="WHERE YOUR AIRFLOW WAS INSTALLED"

AIRFLOW_CONFIG="LOCATION OF YOUR AIRFLOW.CFG FILE"

PYTHONPATH="PATH WHERE YOUR REGULAR PYTHON SCRIPTS/UTILITIES ARE"

and so on.

If you use system manager (systemd/supervisord) - you would set up a unit file within /etc/systemd/system (This is for systemd!)

[Unit] Description=Airflow Workflow daemon After=network-online.target cloud-config.service Requires=network-online.target

[Service] EnvironmentFile=/etc/default/airflow_vars

User=

Group=

Type=simple

WorkingDirectory=

ExecStart= <location of airflow/bin/airflow webserver/scheduler/worker>

Restart=always

RestartSec=5s

[Install]

WantedBy=multi-user.target

Now all your environmental variables are available in your airflow installation.

Another way, is to simply set them up in UI under, Admin tab, Variables selection

This is the link from Airflow documentation on this-

https://airflow.readthedocs.io/en/stable/concepts.html?highlight=environmental%20variables#variables

Upvotes: 1

solomon1994
solomon1994

Reputation: 384

This is true that airflow resets environment variable, when using BashOperator, at least I faced this issue. In documentation of the operator, available at : https://airflow.apache.org/docs/stable/_modules/airflow/operators/bash_operator.html, I found the way to explicitly set the environment for the bash command i.e.

bash_task = BashOperator(
        task_id="bash_task",
        bash_command='echo "here is the message: \'$message\'"',
        env={'message': '{{ dag_run.conf["message"] if dag_run else "" }}'},
    )

Hence I explicitly set the environment for the Bash command as:

env = os.environ.copy(),

make sure to import os, earlier on in the dag file. And it resolved the issue for me.

Upvotes: 2

Related Questions