ltfishie
ltfishie

Reputation: 2987

How to calculate MD5 of a file before saving to gridfs

I am using the java MongoDB driver for gridfs and would like to use MD5 hash to check if a file already exist before saving it. Essentially I am trying to do this in Java.

I tried DigestUtils from apache common-codec with the following logic:

public GridFSDBFile save(InputStream inputStream, String contentType, String filename) throws IOException {
    String md5 = DigestUtils.md5Hex(inputStream);

    List<GridFSDBFile> md5match = gridFs.find(new BasicDBObject("md5", md5));

    if (md5match!=null && md5match.size()>0) {
        return md5match.get(0);
    } else {
        GridFSInputFile input = gridFs.createFile(inputStream, filename, true);
        input.save();
        return gridFs.findOne(input.getId())
    }
}

Looking at the underlying implementation, both DigestUtils and MongoDB driver uses MessageDigest.getInstance("MD5") to calculate MD5 hash. However, it looks like the md5 hash code generated by DigestUtils are not the same as what gridfs generates. Overwrite the "md5" key in GridFSInputFile does not work either.

Upvotes: 3

Views: 2933

Answers (1)

ltfishie
ltfishie

Reputation: 2987

The answer turns out to have nothing to do with Mongodb driver. In order to calculate the md5 hash, DigestUtils must read through the InputStream. In order for the above code to work properly, use mark/reset:

inputStream.mark(Integer.MAX_VALUE);
String md5 = DigestUtils.md5Hex(inputStream);
inputStream.reset();

Upvotes: 3

Related Questions