Reputation: 1745
With one of my airflow task, I have an environment variable issue.
[2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - File "/home/ubuntu/.pyenv/versions/3.6.7/lib/python3.6/os.py", line 669, in __getitem__
[2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - raise KeyError(key) from None
[2019-08-19 04:51:04,603] {{bash_operator.py:127}} INFO - KeyError: 'HOME'
[2019-08-19 04:51:04,639] {{bash_operator.py:131}} INFO - Command exited with return code 1
And my task is the following:
task_name = BashOperator(
task_id='task_name',
bash_command="cd path/to/manage.py && export LC_ALL=C.UTF-8 && export LANG=C.UTF-8 "
f'&& {Variable.get("python_virtualenv_path")}virtual-env-name/bin/python manage.py command_name',
retries=1,
pool='LightAndFast',
dag=dag
)
Any ideas of this issue?
Upvotes: 0
Views: 5199
Reputation: 588
Ideally, you would place all the environmental variables as a file within -
/etc/default/
eg.
/etc/default/airflow_vars
airflow_vars would contain your environmental variables, stuff like -
AIRFLOW_HOME="WHERE YOUR AIRFLOW WAS INSTALLED"
AIRFLOW_CONFIG="LOCATION OF YOUR AIRFLOW.CFG FILE"
PYTHONPATH="PATH WHERE YOUR REGULAR PYTHON SCRIPTS/UTILITIES ARE"
and so on.
If you use system manager (systemd/supervisord) - you would set up a unit file within /etc/systemd/system (This is for systemd!)
[Unit] Description=Airflow Workflow daemon After=network-online.target cloud-config.service Requires=network-online.target
[Service] EnvironmentFile=/etc/default/airflow_vars
User=
Group=
Type=simple
WorkingDirectory=
ExecStart= <location of airflow/bin/airflow webserver/scheduler/worker>
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
Now all your environmental variables are available in your airflow installation.
Another way, is to simply set them up in UI under, Admin
tab, Variables
selection
This is the link from Airflow documentation on this-
https://airflow.readthedocs.io/en/stable/concepts.html?highlight=environmental%20variables#variables
Upvotes: 1
Reputation: 384
This is true that airflow resets environment variable, when using BashOperator
, at least I faced this issue. In documentation of the operator, available at :
https://airflow.apache.org/docs/stable/_modules/airflow/operators/bash_operator.html,
I found the way to explicitly set the environment for the bash command i.e.
bash_task = BashOperator(
task_id="bash_task",
bash_command='echo "here is the message: \'$message\'"',
env={'message': '{{ dag_run.conf["message"] if dag_run else "" }}'},
)
Hence I explicitly set the environment for the Bash command as:
env = os.environ.copy(),
make sure to import os
, earlier on in the dag file. And it resolved the issue for me.
Upvotes: 2