yudhiesh
yudhiesh

Reputation: 6809

pytest failing due to ModuleNotFoundError

I am performing an integrity test on my Airflow DAGs using pytest, this is my current folder structure:

|-- dags
|   |-- 01_lasic_retraining_overview.py
|   |-- 02_lasic_retraining_sagemaker_autopilot.py
|   |-- 03_lasic_retraining_h20_automl.py
|   |-- __init__.py
|   `-- common
|       |-- __init__.py
|       `-- helper.py
|-- docker-compose.yaml
|-- newrelic.ini
|-- plugins
|-- requirements.txt
|-- sample.env
|-- setup.sh
|-- test.sh
`-- tests
    |-- common
    |   `-- test_helper.py
    `-- dags
        |-- test_02_lasic_retraining_sagemaker_autopilot.py
        |-- test_03_lasic_retraining_h20_automl.py
        `-- test_dag_integrity.py

In all my dags except 01_lasic_retraining_overview.py(not testing), I import helper functions from dags/common/helper.py to them which is what is failing the test:

import airflow
    from airflow import DAG
    from airflow.exceptions import AirflowFailException
    from airflow.operators.python import PythonOperator
    from airflow.providers.amazon.aws.hooks.s3 import S3Hook
>   from common.helper import _create_connection, _etl_lasic
E   ModuleNotFoundError: No module named 'common'

dags/03_lasic_retraining_h20_automl.py:6: ModuleNotFoundError
=================================== short test summary info ===================================
FAILED tests/dags/test_dag_integrity.py::test_dag_integrity[/Users/yravindranath/algo_lasic2_ct_pipeline/tests/dags/../../dags/02_lasic_retraining_sagemaker_autopilot.py]
FAILED tests/dags/test_dag_integrity.py::test_dag_integrity[/Users/yravindranath/algo_lasic2_ct_pipeline/tests/dags/../../dags/03_lasic_retraining_h20_automl.py]

Now this code runs with no issue in my docker container. Things that I have tried and did not work:

  1. adding __init__py to the tests folder.
  2. running python -m pytest tests/
  3. removing the __init__.py files in the dir dags
  4. setting PYTHONPATH=. pytest

Code for integrity test is at /tests/dags/test_dag_integrity.py

import re
import glob
import importlib.util
import os

import pytest
from airflow.models import DAG

# go to the root dir and browse for any files that match the pattern
# this will find all the dag files
DAG_PATH = os.path.join(
    os.path.dirname(__file__),
    "..",
    "..",
    "dags/**/0*.py",
)

# holds a list of all the dag files
DAG_FILES = glob.glob(
    DAG_PATH,
    recursive=True,
)
# filter the files to exclude the 01 dag run as that is just a plan of the
# pipeline
DAG_FILES = [file for file in DAG_FILES if not re.search("/01", file)]


@pytest.mark.parametrize("dag_file", DAG_FILES)
def test_dag_integrity(dag_file):
    # Load file
    module_name, _ = os.path.splitext(dag_file)
    module_path = os.path.join(DAG_PATH, dag_file)
    mod_spec = importlib.util.spec_from_file_location(
        module_name,
        module_path,
    )
    module = importlib.util.module_from_spec(
        mod_spec,  # type: ignore
    )
    mod_spec.loader.exec_module(module)  # type: ignore
    # all objects of class DAG found in file
    dag_objects = [
        var
        for var in vars(module).values()
        if isinstance(
            var,
            DAG,
        )
    ]
    # check if DAG objects were found in the file
    assert dag_objects
    # check if there are no cycles in the dags
    for dag in dag_objects:
        dag.test_cycle()  # type: ignore

Upvotes: 1

Views: 1387

Answers (4)

bad programmer
bad programmer

Reputation: 934

  • Make tests/conftest.py file
  • Create this fixture inside conftest.py
Make sure, your path name to the module common is correct
import pytest
import sys

@pytest.fixture(scope='session)
def append_path():
    sys.path.insert(0 , 'absolute_path_to_common_module' )
    yield
  • Now use this fixture as :
@pytest.mark.usefixtures("append_path") 
@pytest.mark.parametrize("dag_file", DAG_FILES) 
def test_dag_integrity(dag_file):
    .....

What we are doing ?

  • Making sure, the module is visible by python.

Note : You could rename your custom-module common to something less common and more unique. No pun intended. To avoid any conflicts.

Upvotes: 0

Saidden
Saidden

Reputation: 111

Throwing a crazy idea here, try adding __init__.py both to */dag or */common and to */tests.

Upvotes: 0

yudhiesh
yudhiesh

Reputation: 6809

I am also running the application in a Docker container where the answer provided by @Jarek Potiuk didn't work when actually running the DAG, so instead I am using a super hack way by just including the import parts that work in docker and the ones that work locally.

try:
    # Works locally with tests
    from common.helper import _create_connection, _etl_lasic
except ImportError:
    # Works in docker container
    from dags.common.helper import _create_connection, _etl_lasic

Upvotes: 0

Jarek Potiuk
Jarek Potiuk

Reputation: 20077

You need to check what your PYTHONPATH is. You likely do not have dags in your PYTHONPATH. Likely your PYTHONPATH points to the root of your file structure, so the right way of importing the "common" folder of it is

import dags.common

Similarly as your common test code is

import tests.common

Python (even python 3) does not have a very good mechanism to import stuff relatively to the currently loaded file. Even if there are "relative" imports (with "." in front) - they are confusing and work differently than you think they are. Avoid using them. Simply make sure your.

Also avoid setting PYTHONPATH to ".". It makes your import work differently depending on which is your current directory. Best way is to set it once and export.

export PYTHONPATH="$(pwd)"

The above will set the PYTHONPATH to the directory you are currently in and it will set it to absolute path.

Upvotes: 1

Related Questions