Mickalot
Mickalot

Reputation: 2751

How to create a tar file that omits timestamps for its contents?

Is there a way to create a .tar file that omits the values of atime/ctime/mtime for its files/directories?

Why do we want to do this?

We have a step in our build process that generates a directory of artifacts that gets packaged into a tarfile. We expect that build step to be idempotent -- given the same inputs, it produces exactly the same files/output each time.

Ideally, we would like also like the step to be bitwise idempotent across clean builds, so that we can use hashes of successive builds to check that nothing has changed. But because tar files include timestamps (atime/ctime/mtime) for each entry, the tar files created by that build step are never bitwise identical to the previous run, even though the contents of every file inside the archive are bitwise identical.

Is there a way to generate a tarfile that omits the timestamps of its entries, so that the step that generates the archive could be bitwise idempotent? (We want to leverage other file metadata that tar preserves, such as file mode bits and symlinks.)

Upvotes: 52

Views: 21419

Answers (4)

Adracus
Adracus

Reputation: 1009

To have a truly idempotent tar, mtime is a good step but not enough. You also need to set the sort order, the owner and group (together with their mapping) and a proper timezone for mtime (since otherwise you're gonna have issues as well between Mac and Linux).

I ended up with

tar --sort=name --owner=root:0 --group=root:0 --mtime='UTC 1980-02-01' ... | gzip -n

Upvotes: 40

StackzOfZtuff
StackzOfZtuff

Reputation: 3106

Many options required

Reproducible-builds.org has a long explainer online for the tar archive format(s).

They arrive at this:

Full example

The recommended way to create a Tar archive is thus:

# requires GNU Tar 1.28+
$ tar --sort=name \
     --mtime="@${SOURCE_DATE_EPOCH}" \
     --owner=0 --group=0 --numeric-owner \
     --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
     -cf product.tar build

But I suggest you run it without the ${SOURCE_DATE_EPOCH} variable and instead use literal "@0":

# requires GNU Tar 1.28+
$ tar --sort=name \
     --mtime="@0" \
     --owner=0 --group=0 --numeric-owner \
     --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
     -cf product.tar build

Read the full explainer here: https://reproducible-builds.org/docs/archives/

BTW: 1. The nice strip-nondeterminism tool unfortunately does NOT work here, because it does not have a handler for TAR files.

BTW: 2. GNU tar also has an explainer online. They also mention gzip for creating .tar.gz/.tgz files: https://www.gnu.org/software/tar/manual/html_node/Reproducibility.html

Upvotes: 11

Evangeline
Evangeline

Reputation: 87

If anyone wants to create reproducible hashes of the directory, I improved a bit answer of @StackzOfZtuff and added deleting the second half of timestamps (btime, mtime):

tar \
  --sort=name \
  --mtime="@0" \
  --owner=0 \
  --group=0 \
  --numeric-owner \
  --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime,delete=btime,delete=mtime \
  -cf - -C yourDirectoryPath . | sha256sum | head -c 64

Plus if you want to debug why two tar archives are different even if were created from the directories with the same content, you can pass archives through | strings that will give them more human readable look and then pass through diff.

git diff --color=always <(
  tar ...params -cf - -C dir1 . | strings
) <(
  tar ...params -cf - -C dir2 . | strings
)

enter image description here

Upvotes: 0

Charles Duffy
Charles Duffy

Reputation: 295223

GNU tar has a --mtime argument, which can be used to store a fixed date in the archive rather than a file's actual mtime:

tar --mtime='1970-01-01' input ...

When compressing a tarball with gzip, it's also necessary to specify -n to prevent name and timestamp of the tar archive from being stored:

tar --mtime='1970-01-01' input ... | gzip -n >input.tar.gz

Upvotes: 28

Related Questions