Reputation: 983
How to resolve the error no module named pandas when one node (in Airflow's DAG) is successful in using it(pandas) and the other is not?
I am unable to deduce as to why I am getting an error no module named pandas.
I have checked via pip3 freeze
and yes, the desired pandas version does show up.
I have deployed this using docker on a kubernetes cluster.
Upvotes: 3
Views: 2414
Reputation: 4961
In my case, I was running airflow with docker compose with a custom docker image that installed additional packages using pip install
.
Dockerfile:
FROM apache/airflow:slim-2.7.2-python3.11
ADD airflow/requirements.txt .
RUN pip install -r requirements.txt
docker-compose.yml:
services:
airflow:
build: .
environment:
AIRFLOW__CORE__EXECUTOR: SequentialExecutor
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__CORE__DAGS_FOLDER: /data/dags
_AIRFLOW_DB_MIGRATE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: airflow
_AIRFLOW_WWW_USER_PASSWORD: airflow
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: 'true'
AIRFLOW__SECRETS__BACKEND: airflow.secrets.local_filesystem.LocalFilesystemBackend
AIRFLOW__SECRETS__BACKEND_KWARGS: '{"variables_file_path": "/data/variables.yml", "connections_file_path": "/data/connections.yml"}'
ports:
- '8080:8080'
- '8793:8793'
- '8794:8794'
volumes:
- ./airflow:/data
command: 'standalone'
I was originally getting the ModuleNotFoundError: No module named 'pandas'
error at DAG import time. That error went away when I installed the pandas package by adding the line pandas==2.1.1
to the requirements.txt
file. But then I started getting the same error on the same DAG, but only when executing the DAG. Though on first glance the error looked like the same one as before, on closer look the error was for a different package
ModuleNotFoundError: No module named 'pandas'
During handling of the above exception, another exception occurred:
...
Exception: pandas library not installed, run: pip install 'apache-airflow-providers-common-sql[pandas]'.
The runtime error went away when I installed apache-airflow-providers-common-sql[pandas]
by adding the line apache-airflow-providers-common-sql[pandas]==1.7.2
to the requirements.txt. I did have to remove the pandas==2.1.1
line from the requirements.txt as it looked like when both apache-airflow-providers-common-sql[pandas]
and pandas
were specified, there was a version mismatch and hence the ModuleNotFoundError: No module named 'pandas'
error at runtime still persisted.
Final requirements.txt:
apache-airflow-providers-postgres==5.6.1
apache-airflow-providers-common-sql[pandas]==1.7.2
Upvotes: 0
Reputation: 45341
Pandas is generally required, and sometimes used in some hooks to return dataframes. Well, it's possible that Airflow was installed with pip
and not pip3
possibly being added as a Python 2 module and not a Python 3 module (though, using pip
should have installed Pandas when one looks at the setup.py
).
Which Operator in your DAG is giving this error?
Do you have any PythonVirtualEnvironmentOperators or BashOperators running python
from the command line (and thus possibly not sharing the same environment that you're checking has pandas
)?
Upvotes: 1