Jerven Clark
Jerven Clark

Reputation: 1219

Numpy from alpine package repo fails to import c-extensions

I am making a docker image that needs pandas and numpy but the installation via pip takes around 20 mins which is too long for my use case. I then opt to install pandas and numpy from alpine package repo but it seems to fail to import numpy correctly.

Here is my Dockerfile:

# syntax=docker/dockerfile:experimental
FROM python:3.9.5-alpine as base

FROM base as builder
RUN apk add build-base gcc musl-dev

RUN --mount=type=cache,target=/root/.cache/pip \
    pip install --target="/install" django

FROM base
RUN apk add py3-pandas py3-numpy

COPY --from=builder /install /usr/local/lib/python3.9/site-packages

ENV PYTHONPATH "${PYTHONPATH}:/usr/lib/python3.9/site-packages"

CMD ["python"]

When I try to import pandas, which depends on numpy, gives me the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/site-packages/pandas/__init__.py", line 16, in <module>
    raise ImportError(
ImportError: Unable to import required dependencies:
numpy: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.9 from "/usr/local/bin/python"
  * The NumPy version is: "1.20.3"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: No module named 'numpy.core._multiarray_umath'

and the error if I import numpy:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/numpy/core/__init__.py", line 22, in <module>
    from . import multiarray
  File "/usr/lib/python3.9/site-packages/numpy/core/multiarray.py", line 12, in <module>
    from . import overrides
  File "/usr/lib/python3.9/site-packages/numpy/core/overrides.py", line 7, in <module>
    from numpy.core._multiarray_umath import (
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/site-packages/numpy/__init__.py", line 145, in <module>
    from . import core
  File "/usr/lib/python3.9/site-packages/numpy/core/__init__.py", line 48, in <module>
    raise ImportError(msg)
ImportError: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.9 from "/usr/local/bin/python"
  * The NumPy version is: "1.20.3"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: No module named 'numpy.core._multiarray_umath'

I am already at wits end trying to figure out what I missed and did wrong. I already tried the troubleshooting tips in the url given by the error trace but nothing seems to solve the issue.

Any help is greatly appreciated.

Upvotes: 2

Views: 1190

Answers (1)

Mr. 47
Mr. 47

Reputation: 106

I know it's been a while since this was asked, and you might've found a solution, or moved on from Alpine to a different distro. But I ran into the same issue, and this was the first thing that popped up on my search. So, after spending a couple of hours and finding a solution, I think it's worthwhile to document it here.

The issue is (obviously) with numpy and pandas packages. I used pre-built wheels from the community repo and ran into the same issue as you. So, evidently, the build process itself is introducing the issue. Specifically, if you look, e.g., under numpy/core at the install location (/usr/lib/python3.9/site-packages), you'll find that all the C-extensions have .cpython-39-x86_64-linux-musl in their name. So, for instance, the module you're having trouble with, numpy.core._multiarray_umath, is named _multiarray_umath.cpython-39-x86_64-linux-musl.so, and not just _multiarray_umath.so. Dropping the .cpython-39-x86_64-linux-musl from those filenames fixed the issue (edit: see addendum for details).

The following line can be added to your Dockerfile after installing py3-pandas and py3-numpy to fix it:

RUN find /usr/lib/python3.9/site-packages -iname "*.so" -exec sh -c 'x="{}"; mv "$x" "${x/cpython-39-x86_64-linux-musl./}"' \;

P.S.: After looking into the issue further, I found the culprit: For some reason, the Python that's running under Alpine thinks its full platform extension suffix (available from importlib.machinery.EXTENSION_SUFFIXES) should be cpython-39-x86_64-linux-gnu.so, and not cpython-39-x86_64-linux-musl.so. I don't believe it was built with glibc, but who knows. So, you could just change the musl to gnu in the names of those shared objects above, and it'd work as well. Not sure why the extension suffix generated during the build is different from the one used by Python at runtime.

Upvotes: 7

Related Questions