user34812
user34812

Reputation: 523

Create .jar files deterministically (identical each time)

I use the jar command to build jar files. While trying to cache the jar files using md5 signatures, I found that jars built from the exact same sources had different md5 signatures.
Upon closer inspection, I found that every time the jar was created the contents were exactly the same (diff -qr was empty). It turns out that the timestamp of creation is encoded in the jar file which throws off the md5 signature. Other people have discovered the same here.

There is even a blog post on how to create jar files identically each time with maven. However, I want a simple solution using the command line using readily available commands such as jar and zip (may have to do this on a server without install permissions), possibly leading to the same "functional" jar as I'm currently getting using jar command.

EDIT: For my purpose, it also suffices to quickly find the md5 so that it is the same across builds, even if the jars are not identical. The only way I found so far is to extract the files in the jar and to md5 all component files. But I'm afraid that is slow for bigger jars and is going to defeat the purpose of caching them to avoid building them in the first place. Is there a better and faster solution?

Upvotes: 10

Views: 1760

Answers (2)

Shubham
Shubham

Reputation: 11

Jar command always create META-INF\MANIFEST.MF with current time. Zip stores files with timestamp and file attributes due to which sha256 or MD5 will be different for two artifacts.

We need to make sure that created, last modified, accessed timestamp and file attributes are always same of all files which are required to create jar or zip.

I have created below script which can take a jar or zip file and make it deterministic by making timestamp constant and setting the right compression level and offset.

#!/bin/bash

usage() {
    echo "Usage : ./createDeterministicArtifact.sh <zip/jar file name>"
    exit 1
}

info() {
    echo "$1"
}

strip_artifact() {
    if [ -z ${file} ]; then
        usage
    fi
    if [ -f ${file} -a -s ${file} ]; then
        mkdir -p ${file}.tmp
        unzip -oq -d ${file}.tmp ${file}
        find ${file}.tmp -follow -exec touch -a -m -t 201912010000.00 {} \+
        if [ "$UNAME" == "Linux" ] ; then
            find ${file}.tmp -follow -exec chattr -a {} \+
        elif [[ "$UNAME" == CYGWIN* || "$UNAME" == MINGW* ]] ; then
            find ${file}.tmp -follow -exec attrib -A {} \+
        fi
        cd ${file}.tmp
        zip -rq -D -X -9 -A --compression-method deflate  ../${file}.new . 
        cd -
        rm -rf ${file}.tmp
        info "Recreated deterministic artifact: ${file}.new"
    else 
        info "Input file is empty. Please validate the file and try again"
    fi
}

file=$1

Upvotes: 1

Beck Yang
Beck Yang

Reputation: 3024

The main issue is jar command always create META-INF\MANIFEST.MF with current time. The file time is saved in zip entry header. This is why MD5 value is different even all file content in jar remain the same: the different zip entry headers produce different zip file.

For jar command, the only solutionis is option -M: not to create a manifest file for the entries.

Upvotes: 4

Related Questions