Sam
Sam

Reputation: 2605

No module error, code not in package. Despite having the path in PYTHONPATH

I have a small trouble with python path while starting a airflow DAG. I am very new to Airflow.

I have my directories as follows: I have specified init.py inside every directory

-- ProjectA
    |-- code
        |__ module1.py
        |__ __init__.py
    |
    |-- dags
       |__ crawler.py   # Contains the bash operator to run a python module
       |__ __init__.py
    |
    |-- jobs
        |__ python_module.py # Contains a function that makes call to module1.py (contains the code to crawl websites) present inside Code package
        |__ __init__.py
    |
    |-- logs
    |
    |-- __init__.py

 and other Airflow files

My Dag implementation for BashOperator is given below.

from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.bash_operator import BashOperator



default_args = {
'owner': 'Sam',
'depends_on_past': False,
'start_date': datetime(2018, 4, 26),
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=2)
}

dag = DAG('dump_aerial_image', default_args=default_args)

t1 = BashOperator(
task_id='AerialDUMP',
bash_command='python /Users/sam/App/ProjectA/jobs/python_module.py',
dag=dag
)

When I run the DAG from the Airflow UI. I get the following error

ImportError: No module named 'code.module1'; 'code' is not a package
[2018-04-25 18:45:07,699] {base_task_runner.py:98} INFO - Subtask:     [2018-04-25 18:45:07,698] {bash_operator.py:105} INFO - Command exited with     return code 1
[2018-04-25 18:45:07,707] {models.py:1595} ERROR - Bash command failed
Traceback (most recent call last):
  File "/Users/sam/App-Setup/anaconda/envs/anaconda35/lib/python3.5/site-    packages/airflow/models.py", line 1493, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/Users/sam/App-Setup/anaconda/envs/anaconda35/lib/python3.5/site-packages/airflow/operators/bash_operator.py", line 109, in execute
    raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed

Not sure how to check this error. I even tried adding /Users/sam//App/ProjectA to my python path. Even this diet work. My Python path looks like

['/Users/sam/App/ProjectA/dags', '/Users/sam/App/ProjectA', '/Users/sam/App-Setup/anaconda/envs/anaconda35/lib/python35.zip', '/Users/sam/App-Setup/anaconda/envs/anaconda35/lib...........]

Not sure how to overcome this situation, any help will be appriciated.

Upvotes: 1

Views: 1640

Answers (2)

Amir Mamo
Amir Mamo

Reputation: 51

try running python /Users/sam/App/ProjectA/jobs/python_module.py. if you get the same error, this is a python issue, not airflow.

are you using celery executor? in that case, does the env variable defined in all executors?

In any case, you can try adding for python_module.py sys.path.append

Upvotes: 1

tobi6
tobi6

Reputation: 8249

There is a lot going on here.

  • There is no import for code.module1, but an import error occurs in the log
  • BashOperator is being used to execute a Python task, not PythonOperator
  • The bash command fails but that doesn't seem to be connected to the module

So I'd suggest:

  • First, try getting the import right with a non-Airflow Python script in the same environment
  • Then, switch to PythonOperator and use the imported function - unless I'm not seeing the reason why this would need to be run with a BashOperator

Upvotes: 1

Related Questions