jtlz2
jtlz2

Reputation: 8437

How do I cache a python package Docker build for Alpine Linux?

A related question asks "Why is a Pandas build slow on Alpine Linux?":

Why does it take ages to install Pandas on Alpine Linux

I would like to know how to work around this (the answers do not say), i.e. how to cache an Alpine build in order to recycle an arbitrary, compiled python module for use in another Docker build.

Such a prebuilt module could be hosted in a private repo. How would a Dockerfile fetch this?

I am specifically interested in a solution for pandas, but it would be perfectly fine to cast the net wider.

Thanks for all help.

Upvotes: 1

Views: 898

Answers (1)

Isaac Rosado
Isaac Rosado

Reputation: 1039

Separate commands using different, "RUN steps", each step will try to use the previous step's cache, as soon as one line changes, the cache chain is invalidated, and all the following lines are executed, so you want to keep slow-changing things at the top, and frequently changing things towards the bottom.

For example, the contents of your Dockerfile could have:

FROM python:2.7-alpine

RUN apk add --update bash curl
RUN apk add gcc make linux-headers musl-dev openldap-dev libxml2-dev libxslt-dev libffi-dev pcre-dev
RUN apk add cython
RUN pip install pandas
#RUN install your package/library

With the above example, you will see output lines like the following (note the ones that say "Using cache"):

Sending build context to Docker daemon  56.83kB
Step 1/11 : FROM python:2.7-alpine
 ---> b630f364abf4
Step 2/11 : RUN apk add --update bash curl
 ---> Using cache
 ---> a611e4bbdbae
Step 3/11 : RUN apk add gcc make linux-headers musl-dev openldap-dev libxml2-dev libxslt-dev libffi-dev pcre-dev
 ---> Using cache
 ---> 87e91533771d
Step 4/11 : RUN apk add cython
 ---> Using cache
 ---> 47e0fd345aa8
Step 5/11 : RUN pip install pandas
 ---> Running in c57947f606e5

Every "Using cache" output line indicates that the line immediately above was not executed, but the results taken from the cached layer.

The first time everything will be executed, but it will be much faster on the following executions (assuming everything else remains the same on the host).

Now you can "docker push :" to your private (or even public) registry and start other builds with a Dockerfile that starts with:

FROM <BASE IMAGE with pandas>:<TAG>

The above is called a "parent / base images" and "builder" paradigm, you can read more at https://docs.docker.com/develop/develop-images/baseimages/

And lately, you can also have "multi-stage builds", which in essence is the same, but simplified so you can use a single Dockerfile: https://docs.docker.com/develop/develop-images/multistage-build/

Upvotes: 2

Related Questions