Erasmus
Erasmus

Reputation: 654

In docker with buildkit and run --mount, why is cabal install Downloading cached packages?

I am in the process of creating a Dockerfile that can build a haskell program. The Dockerfile uses ubuntu focal as a base image, installs ghcup, and then builds a haskell program. There are multiple reasons why I am doing this; it can support a low-configuration CI environment, and it can help new developers who are trying to build a complicated project.

In order to speed up build times, I am using docker v20 with buildkit. I have a sequence of events like this (it's quite a long file, but this excerpt is the relevant part):

# installs haskell
WORKDIR $HOME
RUN git clone https://github.com/haskell/ghcup-hs.git
WORKDIR ghcup-hs
RUN BOOTSTRAP_HASKELL_NONINTERACTIVE=NO ./bootstrap-haskell
#RUN source ~/.ghcup/env  # Uh-oh: can't do this.
# We recreate the contents of ~/.ghcup/env
ENV PATH=$HOME/.cabal/bin:$HOME/.ghcup/bin:$PATH

# builds application
COPY application $HOME/application
WORKDIR $HOME/application
RUN mkdir -p logs
RUN --mount=type=cache,target=$HOME/.cabal \
    --mount=type=cache,target=$HOME/.ghcup \
    --mount=type=cache,target=$HOME/application/dist-newstyle \
    cabal build |& tee logs/configure.log

But when I change some non-code files (README.md for example) in application, and build my docker image ...

DOCKER_BUILDKIT=1 docker build -t application/application:1.0 .

... it takes quite a bit of time and the output from cabal build includes a lot of Downloading [blah] followed by Building/Installing/Completed messages from cabal install.

However when I go into my container and type cabal build, it is much faster (it is already built):

host$ docker run -it application/application:1.0
container$ cabal build  # this is fast

I would expect it to be just as fast in the prior case as well. Since I have not really changed the code files, and the dependencies are all downloaded, and since I am using RUN --mount.

Are there files somewhere that my --mount=type=cache entries are not covering? Is there a package registry file somewhere that I need to include in its own --mount=type=cache line? As far as I can tell, my builds ought to be nearly instant instead of taking several minutes to complete.

Upvotes: 4

Views: 442

Answers (1)

Kevin Peña
Kevin Peña

Reputation: 772

A few years later, but I think I have the answer.

So one of the issues with OPs approach is that $HOME in that context is not going to be replaced with anything. So the target directory for the cache includes a folder called literally $HOME.

The next was getting the right folders, you can do this by just inspecting an image generated with the "wrong" configuration (i.e. without any caching). If you use a tool like dive, you can see that the changes in that layer include:

drwx------         0:0      14 MB  └── root
drwxr-xr-x         0:0     461 kB      ├── .cache
drwxr-xr-x         0:0     461 kB      │   └── cabal
drwxr-xr-x         0:0      11 kB      │       ├── logs
-rw-r--r--         0:0      485 B      │       │   ├── build.log
drwxr-xr-x         0:0      11 kB      │       │   └── ghc-9.8.4
-rw-r--r--         0:0      11 kB      │       │       └── text-2.1.2-030cac37f8d77dcf6263d580f
drwxr-xr-x         0:0     450 kB      │       └── packages
drwxr-xr-x         0:0     450 kB      │           └── hackage.haskell.org
drwxr-xr-x         0:0     450 kB      │               └── text
drwxr-xr-x         0:0     450 kB      │                   └── 2.1.2
-rw-r--r--         0:0     450 kB      │                       └── text-2.1.2.tar.gz
drwxr-xr-x         0:0      13 MB      └── .local
drwxr-xr-x         0:0      13 MB          └── state
drwxr-xr-x         0:0      13 MB              └── cabal
drwxr-xr-x         0:0      13 MB                  └── store
drwxr-xr-x         0:0      13 MB                      └── ghc-9.8.4-1b19
drwxr-xr-x         0:0        0 B                          ├── incoming
-rw-r--r--         0:0        0 B                          │   └── text-2.1.2-030cac37f8d77dcf6
drwxr-xr-x         0:0      14 kB                          ├── package.db
-rw-r--r--         0:0     8.6 kB                          │   ├── package.cache
-rw-r--r--         0:0        0 B                          │   ├── package.cache.lock
-rw-r--r--         0:0     5.1 kB                          │   └── text-2.1.2-030cac37f8d77dcf6
drwxr-xr-x         0:0      13 MB                          └── text-2.1.2-030cac37f8d77dcf6263d
-rw-r--r--         0:0      497 B                              ├── cabal-hash.txt
drwxr-xr-x         0:0      13 MB                              ├── lib
drwxr-xr-x         0:0     2.5 MB                              │   ├── Data
drwxr-xr-x         0:0     2.1 MB                              │   │   ├── Text
-rw-r--r--         0:0      12 kB                              │   │   │   ├── Array.dyn_hi
-rw-r--r--         0:0      12 kB                              │   │   │   ├── Array.hi
(more changes omitted)

With this we can see that the actual place where files were being cached is /root/.cache/cabal and /root/.local/state/cabal/store. Or at least, that's the case in this configuration for GHC and cabal-install, for this example I'm using haskell:9.8.4 as base image.

With all this in mind, here is a full, multi-stage Dockerfile that leverages the build cache (the executable is called example):

FROM haskell:9.8.4 as builder

WORKDIR /app

COPY . .
RUN --mount=type=cache,target=/root/.local/state/cabal/store \
    --mount=type=cache,target=/root/.cache/cabal \
    --mount=type=cache,target=./dist-newstyle \
    cabal update && \
    mkdir bin && \
    cabal install --install-method=copy --installdir=./bin

FROM debian:12-slim
COPY --from=builder /app/bin/example /usr/local/bin/
CMD ["example"]

Upvotes: 0

Related Questions