Reputation: 6809
I am performing an integrity test on my Airflow DAGs using pytest, this is my current folder structure:
|-- dags
| |-- 01_lasic_retraining_overview.py
| |-- 02_lasic_retraining_sagemaker_autopilot.py
| |-- 03_lasic_retraining_h20_automl.py
| |-- __init__.py
| `-- common
| |-- __init__.py
| `-- helper.py
|-- docker-compose.yaml
|-- newrelic.ini
|-- plugins
|-- requirements.txt
|-- sample.env
|-- setup.sh
|-- test.sh
`-- tests
|-- common
| `-- test_helper.py
`-- dags
|-- test_02_lasic_retraining_sagemaker_autopilot.py
|-- test_03_lasic_retraining_h20_automl.py
`-- test_dag_integrity.py
In all my dags except 01_lasic_retraining_overview.py
(not testing), I import helper functions from dags/common/helper.py
to them which is what is failing the test:
import airflow
from airflow import DAG
from airflow.exceptions import AirflowFailException
from airflow.operators.python import PythonOperator
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
> from common.helper import _create_connection, _etl_lasic
E ModuleNotFoundError: No module named 'common'
dags/03_lasic_retraining_h20_automl.py:6: ModuleNotFoundError
=================================== short test summary info ===================================
FAILED tests/dags/test_dag_integrity.py::test_dag_integrity[/Users/yravindranath/algo_lasic2_ct_pipeline/tests/dags/../../dags/02_lasic_retraining_sagemaker_autopilot.py]
FAILED tests/dags/test_dag_integrity.py::test_dag_integrity[/Users/yravindranath/algo_lasic2_ct_pipeline/tests/dags/../../dags/03_lasic_retraining_h20_automl.py]
Now this code runs with no issue in my docker container. Things that I have tried and did not work:
__init__py
to the tests
folder.python -m pytest tests/
__init__.py
files in the dir dags
PYTHONPATH=. pytest
/tests/dags/test_dag_integrity.py
import re
import glob
import importlib.util
import os
import pytest
from airflow.models import DAG
# go to the root dir and browse for any files that match the pattern
# this will find all the dag files
DAG_PATH = os.path.join(
os.path.dirname(__file__),
"..",
"..",
"dags/**/0*.py",
)
# holds a list of all the dag files
DAG_FILES = glob.glob(
DAG_PATH,
recursive=True,
)
# filter the files to exclude the 01 dag run as that is just a plan of the
# pipeline
DAG_FILES = [file for file in DAG_FILES if not re.search("/01", file)]
@pytest.mark.parametrize("dag_file", DAG_FILES)
def test_dag_integrity(dag_file):
# Load file
module_name, _ = os.path.splitext(dag_file)
module_path = os.path.join(DAG_PATH, dag_file)
mod_spec = importlib.util.spec_from_file_location(
module_name,
module_path,
)
module = importlib.util.module_from_spec(
mod_spec, # type: ignore
)
mod_spec.loader.exec_module(module) # type: ignore
# all objects of class DAG found in file
dag_objects = [
var
for var in vars(module).values()
if isinstance(
var,
DAG,
)
]
# check if DAG objects were found in the file
assert dag_objects
# check if there are no cycles in the dags
for dag in dag_objects:
dag.test_cycle() # type: ignore
Upvotes: 1
Views: 1387
Reputation: 934
tests/conftest.py
filecommon
is correctimport pytest
import sys
@pytest.fixture(scope='session)
def append_path():
sys.path.insert(0 , 'absolute_path_to_common_module' )
yield
@pytest.mark.usefixtures("append_path") @pytest.mark.parametrize("dag_file", DAG_FILES) def test_dag_integrity(dag_file): .....
What we are doing ?
Note : You could rename your custom-module common
to something less common and more unique. No pun intended. To avoid any conflicts.
Upvotes: 0
Reputation: 111
Throwing a crazy idea here, try adding __init__.py
both to */dag
or */common
and to */tests
.
Upvotes: 0
Reputation: 6809
I am also running the application in a Docker container where the answer provided by @Jarek Potiuk didn't work when actually running the DAG, so instead I am using a super hack way by just including the import parts that work in docker and the ones that work locally.
try:
# Works locally with tests
from common.helper import _create_connection, _etl_lasic
except ImportError:
# Works in docker container
from dags.common.helper import _create_connection, _etl_lasic
Upvotes: 0
Reputation: 20077
You need to check what your PYTHONPATH
is. You likely do not have dags
in your PYTHONPATH
. Likely your PYTHONPATH
points to the root of your file structure, so the right way of importing the "common" folder of it is
import dags.common
Similarly as your common test code is
import tests.common
Python (even python 3) does not have a very good mechanism to import stuff relatively to the currently loaded file. Even if there are "relative" imports (with "." in front) - they are confusing and work differently than you think they are. Avoid using them. Simply make sure your.
Also avoid setting PYTHONPATH to ".". It makes your import work differently depending on which is your current directory. Best way is to set it once and export.
export PYTHONPATH="$(pwd)"
The above will set the PYTHONPATH
to the directory you are currently in and it will set it to absolute path.
Upvotes: 1