Reputation: 77484
Suppose you have a docker image and you flatten it or squash it to reduce the size. This is beneficial for runtime artifacts so they may consume minimal resources for storage and pushing / pulling.
But I am wondering if there is a trade-off between flattening (producing a single squashed layer) and layer re-use that occurs on the part of a container registry when you push your image to store it.
Here's an example: Suppose you have an image with several layers -- a regular old Docker image -- and it is maybe 500 MB in size. You use squashing or flattening to compress it into a single layer which is maybe 250 MB in size.
Now let's say you need to make a change to your image, create a version 2. Version 2 is a very minor change in a late layer of the container, maybe changing the name of a settings file right before the CMD
instruction or something.
In the case where you had pushed a bunch of expanded layers to the registry, when you go to push this new image, only the differing final layer will need to be stored in the registry's cache, which maybe will mean the total size (for the initial image and your new version 2 image together) will be, say 550 MB or something, depending on that last layer that changed.
Meanwhile, in the case when you flattened it, your new version 2 image is just some completely new single-layer image, with no history in common with the original container. (Maybe your local Docker instance can see the layer history relevant to the flattening, but the registry doesn't have it).
In this case, you'll have to store roughly 500 MB in the registry: 250 MB each for the first and second versions of the image.
Clearly you can see as soon as we do this a third time, the total space of the flattened images is actually larger than the space of incremental changes to expanded-layer images.
Is there something I am missing about the way that this works? It suggests you would only want to perform the flattening at the moment before you ship the container to its final destination for usage -- but you would not generally want to do the flattening when storing in a registry.
There could be corner cases where the base image is so large and the flattening gives so much size reduction that it's worthwhile, but I am trying to understand the general case, and I cannot find documentation that discusses this particular aspect of layer flattening.
Upvotes: 2
Views: 1165
Reputation: 264761
Squashing an image does remove the ability to use cached image layers and does increase the disk space used when you have multiple copies of the image. For this reason I've yet to see it used with my clients. The preferred way to do this is to configure the Dockerfile to maximize reuse the cache of previous builds of an image.
If you are seeing a 50% reduction in image size from a squash, there's often a better way to structure the Dockerfile to avoid the layer bloat. The common situation I know of that squashing improves is when you need to copy a large file from the context with a COPY
and then modify or later delete that file in a future RUN
command. There's no way to chain a COPY
and RUN
command together. You may be able to convert the COPY
to a RUN curl http://local-artifact-repo/...
. Or with multi-stage builds, you can now perform all the COPY
and other RUN
commands in one stage, and then COPY
the result in the final image. The last COPY
would result in an entirely new layer even if you only made a minor change, but so would chaining the commands in a RUN
.
Upvotes: 2