Reputation: 6179
We have a continuous integration pipeline on circleci that does the following:
The problem is with step 5. The push step takes 5 minutes every time. If I understand it correctly, docker hub is meant to cache layers so we don't have to re-push things like the base image and dependencies if they are not updated.
I ran the build twice in a row, and I see a lot of crossover in the hash of the layers being pushed. Yet rather than "Image already exists" I see "Image successfully pushed".
Here's the output of build 1's docker push, and here's build 2
If you diff those two files you'll see that only 2 layers differ in each build:
< ca44fed88be6: Buffering to Disk
< ca44fed88be6: Image successfully pushed
< 5dbd19bfac8a: Buffering to Disk
< 5dbd19bfac8a: Image successfully pushed
---
> 9136b10cfb72: Buffering to Disk
> 9136b10cfb72: Image successfully pushed
> 0388311b6857: Buffering to Disk
> 0388311b6857: Image successfully pushed
So why is it that all the images have to re-push every time?
Upvotes: 12
Views: 800
Reputation: 4708
The process should work as you described. In fact we're building all of our images in this way without problems. Usually there are just a few changes to the topmost layers and only those are pushed to the registry - otherwise the whole concept of image layers would be useless.
See here for an example. Only the two topmost layers have changed, are pushed for :latest
and for :4.0.2
there's no push at all. We're tagging images with git tags and for some projects we even tag images with git describe
- to get the rollback functionality, just in case.
You can get the project source-code also from GitHub to try it out.
A few things to note about the setup: We're using a self-hosted GitLab CI with a customized runner which runs docker
and docker-compose
on an isolated host with Docker 1.9.1, but that should not make any difference.
There may be also differences in the registry version, I had the feeling (but I am not 100% sure) that some older repos on DockerHub are still running on registry v1
, newer ones always on v2
- so you may try creating a new repo and see if the issue still occurs.
Please note that the behavior for tags described above does only apply when pushing the same image-name, if you push the same image layers with another name, you always need to push all layers, despite the fact that all layers should already exists on the registry, so I guess repo/image:mytag1
and repoimage:mytag2
actually go to repo/image
and the missing slash is just a typo.
Another cause could be that your images are built on different hosts on Circle CI, but then you should also get different layer IDs, so I think this is not very likely.
I suggest to build an image manually and try to reproduce the problem or contact Circle CI about this issue.
Upvotes: 0
Reputation: 43487
Using a different tag creates a different image which, when pushed, cannot rely on the cache.
For example the two commands:
$ docker commit -m "thing" -a "me" db65bf421f96 me/thing:v1
$ docker commit -m "thing" -a "me" db65bf421f96 me/thing:v2
yield utterly distinctimages even though they were created from identical images (db65bf421f96). When pushed, dockerhub must treat them as completely separate images as can be seen with:
$ docker images
REPOSITORY TAG IMAGE ID
me/thing v2 f14aa8ac6bae
me/thing v1 c7d72ccc1d71
The image IDs are unique and thus the images are unique even only if they vary in tags.
You could say "docker should recognize them as being bit for bit identical" and thus treat them as cachable. But it doesn't (yet).
The only surprise for me in your example is that you got any duplicate image IDs at all.
Authoritative (if less explanatory) documentation can be found at docker in "Build your own images".
Upvotes: 1