Reputation: 10033
I'm building a docker
image for a data science project.
I install core dependencies via RUN apk add <package>
.
Dockerfile-dev
FROM python:3.6-alpine
#SOFTWARE PACKAGES
ENV PACKAGES="\
dumb-init \
musl \
libc6-compat \
linux-headers \
build-base \
bash \
git \
ca-certificates \
freetype \
libgfortran \
libgcc \
libstdc++ \
openblas \
tcl \
tk \
libssl1.0 \
"
# PYTHON DATA SCIENCE PACKAGES
ENV PYTHON_PACKAGES="\
numpy \
matplotlib \
scipy \
scikit-learn \
pandas \
nltk \
"
RUN apk add --no-cache --virtual build-dependencies python3 \
&& apk add --virtual build-runtime \
build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
&& ln -s /usr/include/locale.h /usr/include/xlocale.h \
&& python3 -m ensurepip \
&& rm -r /usr/lib/python*/ensurepip \
&& pip3 install --upgrade pip setuptools \
&& ln -sf /usr/bin/python3 /usr/bin/python \
&& ln -sf pip3 /usr/bin/pip \
&& rm -r /root/.cache \
&& pip install --no-cache-dir $PYTHON_PACKAGES \
&& apk del build-runtime \
&& apk add --no-cache --virtual build-dependencies $PACKAGES \
&& rm -rf /var/cache/apk/*
# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip install -r requirements.txt
Everything was build up to pandas
, at which point this error appeared:
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 359, in get_provider
module = sys.modules[moduleOrReq]
KeyError: 'numpy'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-v7gyw8y_/pandas/setup.py", line 732, in <module>
ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
File "/tmp/pip-install-v7gyw8y_/pandas/setup.py", line 475, in maybe_cythonize
numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1144, in resource_filename
return get_provider(package_or_requirement).get_resource_filename(
File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 361, in get_provider
__import__(moduleOrReq)
ModuleNotFoundError: No module named 'numpy'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-v7gyw8y_/pandas/
But numpy
HAD been installed beforehand:
Running setup.py install for numpy: finished with status 'done'
Not yet defeated, I moved pandas==0.20.3
(this version worked in my conda py36
env) into requirements.txt
, and it was installed, as log reveals:
Successfully built: pandas
Installing collected packages: pandas
Successfully installed: pandas-0.20.3
After build time, however, running the container logs the following error:
users_1 | File "/usr/src/app/project/api/classifiers/metadata/learn.py", line 14, in <module>
users_1 | import pandas as pd
users_1 | ModuleNotFoundError: No module named 'pandas'
So it was installed by pip
but can't be found?
How do install pandas
via RUN apk add
in order to keep build consistency for my data science project?
Upvotes: 0
Views: 3621
Reputation: 10033
Adding the following line did the trick for me, inside Dockerfile-dev
:
&& pip install --no-cache-dir $PYTHON_PACKAGES \
&& pip3 install 'pandas<0.21.0' \ # <-------------------- new line
&& apk del build-runtime \
&& apk add --no-cache --virtual build-dependencies $PACKAGES \
I had to explicitly specify pandas
version.
Upvotes: 1