Reputation: 1219
I am making a docker image that needs pandas and numpy but the installation via pip takes around 20 mins which is too long for my use case. I then opt to install pandas and numpy from alpine package repo but it seems to fail to import numpy correctly.
Here is my Dockerfile:
# syntax=docker/dockerfile:experimental
FROM python:3.9.5-alpine as base
FROM base as builder
RUN apk add build-base gcc musl-dev
RUN --mount=type=cache,target=/root/.cache/pip \
pip install --target="/install" django
FROM base
RUN apk add py3-pandas py3-numpy
COPY --from=builder /install /usr/local/lib/python3.9/site-packages
ENV PYTHONPATH "${PYTHONPATH}:/usr/lib/python3.9/site-packages"
CMD ["python"]
When I try to import pandas, which depends on numpy, gives me the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.9/site-packages/pandas/__init__.py", line 16, in <module>
raise ImportError(
ImportError: Unable to import required dependencies:
numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.9 from "/usr/local/bin/python"
* The NumPy version is: "1.20.3"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: No module named 'numpy.core._multiarray_umath'
and the error if I import numpy:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/numpy/core/__init__.py", line 22, in <module>
from . import multiarray
File "/usr/lib/python3.9/site-packages/numpy/core/multiarray.py", line 12, in <module>
from . import overrides
File "/usr/lib/python3.9/site-packages/numpy/core/overrides.py", line 7, in <module>
from numpy.core._multiarray_umath import (
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.9/site-packages/numpy/__init__.py", line 145, in <module>
from . import core
File "/usr/lib/python3.9/site-packages/numpy/core/__init__.py", line 48, in <module>
raise ImportError(msg)
ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.9 from "/usr/local/bin/python"
* The NumPy version is: "1.20.3"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: No module named 'numpy.core._multiarray_umath'
I am already at wits end trying to figure out what I missed and did wrong. I already tried the troubleshooting tips in the url given by the error trace but nothing seems to solve the issue.
Any help is greatly appreciated.
Upvotes: 2
Views: 1190
Reputation: 106
I know it's been a while since this was asked, and you might've found a solution, or moved on from Alpine to a different distro. But I ran into the same issue, and this was the first thing that popped up on my search. So, after spending a couple of hours and finding a solution, I think it's worthwhile to document it here.
The issue is (obviously) with numpy
and pandas
packages. I used pre-built wheels from the community repo and ran into the same issue as you. So, evidently, the build process itself is introducing the issue. Specifically, if you look, e.g., under numpy/core
at the install location (/usr/lib/python3.9/site-packages
), you'll find that all the C-extensions have .cpython-39-x86_64-linux-musl
in their name. So, for instance, the module you're having trouble with, numpy.core._multiarray_umath
, is named _multiarray_umath.cpython-39-x86_64-linux-musl.so
, and not just _multiarray_umath.so
. Dropping the .cpython-39-x86_64-linux-musl
from those filenames fixed the issue (edit: see addendum for details).
The following line can be added to your Dockerfile
after installing py3-pandas
and py3-numpy
to fix it:
RUN find /usr/lib/python3.9/site-packages -iname "*.so" -exec sh -c 'x="{}"; mv "$x" "${x/cpython-39-x86_64-linux-musl./}"' \;
P.S.: After looking into the issue further, I found the culprit: For some reason, the Python that's running under Alpine thinks its full platform extension suffix (available from importlib.machinery.EXTENSION_SUFFIXES
) should be cpython-39-x86_64-linux-gnu.so
, and not cpython-39-x86_64-linux-musl.so
. I don't believe it was built with glibc, but who knows. So, you could just change the musl
to gnu
in the names of those shared objects above, and it'd work as well. Not sure why the extension suffix generated during the build is different from the one used by Python at runtime.
Upvotes: 7