8-Bit Borges
8-Bit Borges

Reputation: 10033

Alpine Linux : package installed but module not found

I'm building a docker image for a data science project.

I install core dependencies via RUN apk add <package>.

Dockerfile-dev

FROM python:3.6-alpine

#SOFTWARE PACKAGES
ENV PACKAGES="\
    dumb-init \
    musl \
    libc6-compat \
    linux-headers \
    build-base \
    bash \
    git \
    ca-certificates \
    freetype \
    libgfortran \
    libgcc \
    libstdc++ \
    openblas \
    tcl \
    tk \
    libssl1.0 \
    "
# PYTHON DATA SCIENCE PACKAGES    
ENV PYTHON_PACKAGES="\
    numpy \
    matplotlib \
    scipy \
    scikit-learn \
    pandas \
    nltk \
    "     
RUN apk add --no-cache --virtual build-dependencies python3 \
    && apk add --virtual build-runtime \
    build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
    && ln -s /usr/include/locale.h /usr/include/xlocale.h \
    && python3 -m ensurepip \
    && rm -r /usr/lib/python*/ensurepip \
    && pip3 install --upgrade pip setuptools \
    && ln -sf /usr/bin/python3 /usr/bin/python \
    && ln -sf pip3 /usr/bin/pip \
    && rm -r /root/.cache \
    && pip install --no-cache-dir $PYTHON_PACKAGES \
    && apk del build-runtime \
    && apk add --no-cache --virtual build-dependencies $PACKAGES \
    && rm -rf /var/cache/apk/*

# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip install -r requirements.txt

Everything was build up to pandas, at which point this error appeared:

    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 359, in get_provider
        module = sys.modules[moduleOrReq]
    KeyError: 'numpy'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-v7gyw8y_/pandas/setup.py", line 732, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/tmp/pip-install-v7gyw8y_/pandas/setup.py", line 475, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
      File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1144, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/usr/local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 361, in get_provider
        __import__(moduleOrReq)
    ModuleNotFoundError: No module named 'numpy'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-v7gyw8y_/pandas/

But numpy HAD been installed beforehand:

Running setup.py install for numpy: finished with status 'done'

Not yet defeated, I moved pandas==0.20.3 (this version worked in my conda py36 env) into requirements.txt, and it was installed, as log reveals:

Successfully built: pandas
Installing collected packages: pandas
Successfully installed: pandas-0.20.3

After build time, however, running the container logs the following error:

users_1     | File "/usr/src/app/project/api/classifiers/metadata/learn.py", line 14, in <module>
users_1     |     import pandas as pd
users_1     | ModuleNotFoundError: No module named 'pandas'

So it was installed by pip but can't be found?

How do install pandas via RUN apk add in order to keep build consistency for my data science project?

Upvotes: 0

Views: 3621

Answers (1)

8-Bit Borges
8-Bit Borges

Reputation: 10033

Adding the following line did the trick for me, inside Dockerfile-dev:

&& pip install --no-cache-dir $PYTHON_PACKAGES \
&& pip3 install 'pandas<0.21.0' \ # <-------------------- new line
&& apk del build-runtime \
&& apk add --no-cache --virtual build-dependencies $PACKAGES \

I had to explicitly specify pandas version.

Upvotes: 1

Related Questions