Reputation: 633
Receiving below error in task logs when running DAG:
FileNotFoundError: [Errno 2] No such file or directory: 'beeline': 'beeline'
This is my DAG:
import airflow
from airflow import DAG
from airflow.providers.apache.hive.operators.hive import HiveOperator
from airflow.utils.dates import days_ago
from datetime import timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
dag_data_summarizer = DAG(
description='Data summarizer DAG',
schedule_interval='*/20 * * * *',
hql_query = """create database if not exist new_test_db;"""
hive_task = HiveOperator(
run_as_user="airflow" # airflow user has beeline executable set in PATH
if __name__ == '__main__':
The new_hive_conn
connection is of type "hive_cli" (tried with a connection type "hiveserver2" as well did not work)
The task log prints the below command:
beeline -u "jdbc:hive2://hive-server-1:10000/default;auth=NONE"
When I run this command on the worker docker container the command works and I am connected with the hive server.
The worker container has the beeline
executable configured and set on its PATH for the "airflow" and "root" users:
Upvotes: 3
Views: 1551
Reputation: 20097
The 'run_as_user' feature uses 'sudo' to switch to airflow
user in non-interactive mode. The sudo
comand will never (no matter what parameters you specify including -E) preserve PATH variable unless you do sudo in --interactive mode (logging in by the user). Only in the --interactive mode the user's .profile , .bashrc and other startup scripts are executed (and those are the scripts that set PATH for the user usually).
All non-interactive 'sudo' command will have path set to secure_path
set in /etc/sudoers file.
My case here:
You need to add your path to /etc/sudoers
or copy/link beeline into one of the existing "secure" binary paths.
Upvotes: 1