dwjbosman
dwjbosman

Reputation: 966

How to get git clone to play nice with Docker cache?

When I clone a repository twice, eg.:

git clone <repo_X> --depth 1 clone1
git clone <repo_X> --depth 1 clone2

and then do a diff

diff -r clone1 clone2

This shows differences:

Binary files clone1/.git/index and clone2/.git/index differ
...
diff -r clone1/.git/logs/HEAD clone2/.git/logs/HEAD
...
diff -r clone1/.git/logs/refs/remotes/origin/HEAD 
...

It seems that among others the time when cloning is recorded in a file.

I want to add some repositories to a Docker Image. Docker uses its cache when the files are not changed. Unfortunately after a clone Docker always invalidates the cache due to the changed files.

  1. Is it somehow possible to have two clones of a repo result in exactly the same files? (Note.: I don't want to remove the .git directory as I want to be able to use git inside the image to check the version of the repo.)

  2. Is it possible to let Docker ignore the .git folder when it comes to caching (Note that the .git folder still must be added to the image, so .dockerignore is not an option?)

Upvotes: 8

Views: 2914

Answers (2)

hakre
hakre

Reputation: 198204

Do not use git-clone(1) but git-archive(1) (at least at the end). It also contains the revision that was archived and you can have markers in the file which contains the revision, e.g. create a flag-file.

For repositories you need a full clone, clone at the same time (or change the metadata after clone to the baseline).

Normally full clones should not be specifically necessary. If, you can also consider a tar-pipe in the process in which you take care to streamline the result so that the cache stays effective (only invalidate when needed).

Upvotes: 1

Arty
Arty

Reputation: 16765

You can use new Docker's BuildKit's feature --mount=cache. Toy example of Dockerfile:

FROM ubuntu
RUN --mount=type=cache,target=/var/cache/apt \
    apt update && apt upgrade -y && apt install -yq git
RUN echo A00
RUN --mount=type=cache,target=/tmp/git_cache/ \
    git clone --depth=1 https://github.com/qtox/qtox/ /tmp/git_cache/qtox/; \
    cd /tmp/git_cache/qtox/ && git pull && cp -r ./ /tmp/my_qtox/
RUN echo B00

Above dockerfile can be built by command:

sudo env DOCKER_BUILDKIT=1 docker build -f Dockerfile .

notice presence of DOCKER_BUILDKIT=1 environment variable, it is necessary to enable all BuildKit's features inside docker build. You can read about BuildKit's features here.

For example I cloned qTox repo above as it is quite huge.

--mount=cache feature automatically creates temporary directory meant for caching and mounts it into /tmp/git_cache/ (target) inside container. If some previous layers changed, e.g. echo A00 changed to echo A01 then this cloning is done immediately without delay because it is just taken from cache.

Also as you requested using this cache will make cloning repository being totally same. Only when new commits appear inside repository then git pull is done and repository changes. Unless there new commits this cached repository will stay the same. Hence you'll have identical git repo every time when you run docker build again.

Only rarely Docker will automatically delete cached directory if it wasn't used for long time or if you have low disk free space.

As you can see from docker-file above final git repo will appear inside /tmp/my_qtox/ folder of container. You may change this path to whatever you need for your case.

Also you may have noticed that I used same caching mechanism when installing APT packages. This is very handy because when image is rebuilt all packages are not redownloaded from remote Ubuntu server, but taken from cached directory. It is useful when previous docker layers before apt install have changed or when you add new apt packages to installation list, in both cases apt install will re-run very fast.

Upvotes: 6

Related Questions