Reputation: 963
I'm trying to build a multistage docker image with some python packages. For some reason, pip wheel
command still downloads source files .tar.gz
for few packages even though .whl
files exist in Pypi. For example: it does it for pandas, numpy.
Here is my requirements.txt:
# REST client
requests
# ETL
pandas
# SFTP
pysftp
paramiko
# LDAP
ldap3
# SMB
pysmb
First stage of the Dockerfile:
ARG IMAGE_TAG=3.7-alpine
FROM python:${IMAGE_TAG} as python-base
COPY ./requirements.txt /requirements.txt
RUN mkdir /wheels && \
apk add build-base openssl-dev pkgconfig libffi-dev
RUN pip wheel --wheel-dir=/wheels --requirement /requirements.txt
ENTRYPOINT tail -f /dev/null
Output below shows that it is downloading source package for Pandas but it got a wheel for Requests package. Also, surprisingly it takes a lot of time (I really mean a lot of time) to download and build these packages !!
Step 5/11 : RUN pip wheel --wheel-dir=/wheels --requirement /requirements.txt
---> Running in d7bd8b3bd471
Collecting requests (from -r /requirements.txt (line 4))
Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)
Saved /wheels/requests-2.22.0-py2.py3-none-any.whl
Collecting pandas (from -r /requirements.txt (line 7))
Downloading https://files.pythonhosted.org/packages/0b/1f/8fca0e1b66a632b62cc1ae38e197befe48c5cee78f895edf4bf8d340454d/pandas-0.25.0.tar.gz (12.6MB)
I would like to know how I can force it get a wheel file for all the required packages and also for the dependencies listed in these packages. I observed that some dependencies get a wheel file but others get the source packages.
NOTE: code above is a combination of multiple online sources.
Any help to make this build process easier is greatly appreciated.
Thanks in Advance.
Upvotes: 8
Views: 15291
Reputation: 66411
You are using Alpine Linux. This one is somewhat unique as it uses musl as the underlying libc implementation, as opposed to the most other Linux distros which use glibc.
If a Python project implements C extensions (this is what e.g. numpy
or pandas
do), it has two options: either
.tar.gz
, .tar.bz2
or .zip
) so that the C extensions are compiled using the C compiler/library found on the target system, orNow, Python defines the manylinux1
platform tag which is specified in PEP 513 and updated in PEP 571. Basically, the name says it all - wheels with compiled C extensions should be built against glibc and thus will work on many distros (that use glibc), but not on some (Alpine being one of them).
For you, it means that you have two possibilities: either build packages from source dists (this is what pip
already does), or install the prebuilt packages via Alpine's package manager. E.g. for py3-pandas
it would mean doing:
# echo "@edge http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
# apk update
# apk add py3-pandas@edge
However, I don't see a big issue with building packages from source. When done right, you capture it in a separate layer placed as high as possible in the image, so it is cached and not rebuilt each time.
You might ask, why there's no platform tag analogous to manylinux1
, but for musl-based distros? Because no one has written a PEP similar to PEP 513 that defines a musllinux
platform tag yet. If you are interested in the current state of it, take a look at the issue #37.
PEP 656 That defines a musllinux
platform tag is now accepted, so it (hopefully) won't last long until prebuilt wheels for Alpine start to ship. You can track the current implementation state in auditwheel#305.
Upvotes: 6
Reputation: 3957
For Python 3, your packages will be installed from wheels with ordinary pip call:
pip install pandas numpy
From the docs:
Pip prefers Wheels where they are available. To disable this, use the --no-binary flag for pip install.
If no satisfactory wheels are found, pip will default to finding source archives.
Upvotes: -3