Reputation: 2987
I am using the java MongoDB driver for gridfs and would like to use MD5 hash to check if a file already exist before saving it. Essentially I am trying to do this in Java.
I tried DigestUtils
from apache common-codec with the following logic:
public GridFSDBFile save(InputStream inputStream, String contentType, String filename) throws IOException {
String md5 = DigestUtils.md5Hex(inputStream);
List<GridFSDBFile> md5match = gridFs.find(new BasicDBObject("md5", md5));
if (md5match!=null && md5match.size()>0) {
return md5match.get(0);
} else {
GridFSInputFile input = gridFs.createFile(inputStream, filename, true);
input.save();
return gridFs.findOne(input.getId())
}
}
Looking at the underlying implementation, both DigestUtils
and MongoDB driver uses MessageDigest.getInstance("MD5") to calculate MD5 hash. However, it looks like the md5 hash code generated by DigestUtils
are not the same as what gridfs generates. Overwrite the "md5" key in GridFSInputFile does not work either.
Upvotes: 3
Views: 2933
Reputation: 2987
The answer turns out to have nothing to do with Mongodb driver. In order to calculate the md5 hash, DigestUtils must read through the InputStream. In order for the above code to work properly, use mark/reset:
inputStream.mark(Integer.MAX_VALUE);
String md5 = DigestUtils.md5Hex(inputStream);
inputStream.reset();
Upvotes: 3