jtmarmon
jtmarmon

Reputation: 6179

Docker hub image cache doesn't seem to be working

We have a continuous integration pipeline on circleci that does the following:

  1. Loads repo/image:mytag1 from the cache directory to be able to use cached layers
  2. Builds a new version: docker build -t repoimage:mytag2
  3. Saves the new version to the cache directory with docker save
  4. Runs tests
  5. Pushes to docker hub: docker push repo/image:mytag2

The problem is with step 5. The push step takes 5 minutes every time. If I understand it correctly, docker hub is meant to cache layers so we don't have to re-push things like the base image and dependencies if they are not updated.

I ran the build twice in a row, and I see a lot of crossover in the hash of the layers being pushed. Yet rather than "Image already exists" I see "Image successfully pushed".

Here's the output of build 1's docker push, and here's build 2

If you diff those two files you'll see that only 2 layers differ in each build:

< ca44fed88be6: Buffering to Disk
< ca44fed88be6: Image successfully pushed
< 5dbd19bfac8a: Buffering to Disk
< 5dbd19bfac8a: Image successfully pushed
---
> 9136b10cfb72: Buffering to Disk
> 9136b10cfb72: Image successfully pushed
> 0388311b6857: Buffering to Disk
> 0388311b6857: Image successfully pushed

So why is it that all the images have to re-push every time?

Upvotes: 12

Views: 800

Answers (2)

schmunk
schmunk

Reputation: 4708

The process should work as you described. In fact we're building all of our images in this way without problems. Usually there are just a few changes to the topmost layers and only those are pushed to the registry - otherwise the whole concept of image layers would be useless.

See here for an example. Only the two topmost layers have changed, are pushed for :latest and for :4.0.2 there's no push at all. We're tagging images with git tags and for some projects we even tag images with git describe - to get the rollback functionality, just in case.

You can get the project source-code also from GitHub to try it out.

A few things to note about the setup: We're using a self-hosted GitLab CI with a customized runner which runs docker and docker-compose on an isolated host with Docker 1.9.1, but that should not make any difference.

There may be also differences in the registry version, I had the feeling (but I am not 100% sure) that some older repos on DockerHub are still running on registry v1, newer ones always on v2 - so you may try creating a new repo and see if the issue still occurs.

Please note that the behavior for tags described above does only apply when pushing the same image-name, if you push the same image layers with another name, you always need to push all layers, despite the fact that all layers should already exists on the registry, so I guess repo/image:mytag1 and repoimage:mytag2 actually go to repo/image and the missing slash is just a typo.

Another cause could be that your images are built on different hosts on Circle CI, but then you should also get different layer IDs, so I think this is not very likely.

I suggest to build an image manually and try to reproduce the problem or contact Circle CI about this issue.

Upvotes: 0

msw
msw

Reputation: 43487

Using a different tag creates a different image which, when pushed, cannot rely on the cache.

For example the two commands:

$ docker commit -m "thing" -a "me" db65bf421f96 me/thing:v1
$ docker commit -m "thing" -a "me" db65bf421f96 me/thing:v2

yield utterly distinctimages even though they were created from identical images (db65bf421f96). When pushed, dockerhub must treat them as completely separate images as can be seen with:

$ docker images
REPOSITORY     TAG      IMAGE ID
me/thing       v2       f14aa8ac6bae
me/thing       v1       c7d72ccc1d71

The image IDs are unique and thus the images are unique even only if they vary in tags.

You could say "docker should recognize them as being bit for bit identical" and thus treat them as cachable. But it doesn't (yet).

The only surprise for me in your example is that you got any duplicate image IDs at all.

Authoritative (if less explanatory) documentation can be found at docker in "Build your own images".

Upvotes: 1

Related Questions