Patrick Huy
Patrick Huy

Reputation: 995

How to do deterministic builds of Docker images?

I'm trying to build Docker images and I would like my Docker images to be deterministic. Much to my surprise I found that even a trivial Dockerfile such as

FROM scratch
ENV a b

Produces different IDs when built repeatedly using docker build --no-cache .

How could I make my builds deterministic and whats causing the changes in image IDs? When caching is enabled the same ID is produced.

The reason I'm trying to get this reproducibility is to enable producing the same layers in a distributed build environment. I can not control where a build is run therefore I can not know what is in the cache. Also the Docker build downloads files using wget from an ftp which may or may not have changed, currently I can not easily tell Docker from within a Dockerfile if the results of a RUN should invalidate the cache. Therefore if I could just produce the same ID for identical layers (when no cache is used) these layers would not have to be "push"ed and "pull"ed again.

Also all the reasons listed here: https://reproducible-builds.org/

Upvotes: 5

Views: 3010

Answers (2)

DeusXMachina
DeusXMachina

Reputation: 1399

AFAIK, currently docker images do not hash to byte-exact hashes, since the metadata currently contains stateful information such as created date. You can check out the design doc from 1.10. Unfortunately, it looks like the history metadata is an important part of image validity and identification.

Don't get me wrong, I'm all about reproducible builds. However I don't believe hash-exactness is the best criteria for measuring reproducibility of a docker image. A docker image isn't a compiled binary. There is no way to guarantee the results of a stage will ever be able to be reproduced, so even if the datetime metadata was absent, it would not guarantee reproducible builds. Take this pathological example:

RUN curl "https://www.random.org/strings/?num=1&len=20&digits=on&unique=on&format=plain&rnd=new" -o nonce.txt

Upvotes: 5

gasc
gasc

Reputation: 648

The image ID is a SHA256 of the image's configuration object (what you get when you do a docker image inspect). Run this with the images you are creating and you will see differences between them.

Upvotes: 3

Related Questions