user2263572
user2263572

Reputation: 5606

Adding md5 hash value to mongo collection

Issue: I currently have a mongo collection with 100,000 documents. Each document has 3 fields (_id, name, age). I want to add a 4th field to each document called hashValue that stores the md5 hash value of each documents name field.

I currently can interact with my collection via the mongo shell or via Mongoose ODM as part of a nodeJS app.

Possible Solutions:

  1. Use Mongoose/nodeJs:

I realize this won't work (don't believe you can iterate through a cursor in this manner), but hopefully it shows what I'm trying to do.

var crypto = require('crypto');

    MyCollection.find().forEach(function(el){
        var hash = crypto.createHash('md5').update(el.name).digest("hex");
        el.name = hash;
        el.save()
    });
  1. Use mongo Shell - Almost same as above, and I realize something like the above syntax would work. Only issue is that I don't know how to create the md5 hash in the mongo shell. But I am able to iterate through each document and add a field.

  2. (possible workaround) - The goal of this is to be able to query based off the md5 hash of a name value. I believe mongo allows you to create a hashed index (link here). Only issue is that I can't find an example of anyone using this for querying (only seems to be used for sharding) and I'm not sure if that will work later on. (Example: I want to md5 hash a name I collect from a user, and then query my mongo collection to see if I can find that md5 hash in the hashValue field)

Upvotes: 5

Views: 17738

Answers (4)

user109764
user109764

Reputation: 654

As of now (version 7) you can use hex_md5 inside $function aggregation:

$addFields: {
  _md5: {
    $function: {
      body: function(token1, currency) {
        return hex_md5(token1 + "_" + currency);
      },
      args: ["$_token1", "$_currency"],
      lang: "js"
    }
  },
}

Upvotes: 2

Volodymyr Synytskyi
Volodymyr Synytskyi

Reputation: 4055

You can iterate through cursor in mongoose using streams and update all the records using bulk.

mongoose.connection.on("open", function(err,conn) {
    var bulk = MyCollection.collection.initializeUnorderedBulkOp();
    MyCollection.find().stream()
        .on('data', function(el){
            var hash = crypto.createHash('md5').update(el.name).digest("hex");
            // add document update operation to a bulk
            bulk.find({'_id': el._id}).update({$set: {name: hash}});
        })
        .on('error', function(err){
            // handle error
        })
        .on('end', function(){
            // execute all bulk operations
            bulk.execute(function (error) {
                // final callback
                callback();                   
            });
        });
    });

Upvotes: 1

I personally would not prefer to go with option 3 (i.e., Possible workaround). Tow reasons - 1. When querying the data we have to make sure that application uses the same hash function and in the same way, as that of Mongo DB, to derive the hash value. I think Mongo DB uses MD5 and considers only the first 64 bits of hash. The disadvantage I see is the application gets tied to the internal implementation of Mongo DB hashing and could change at any point.

  1. Hashed Indexes are good for point queries (equality queries). But they don't support range queries (age > & age > 50), like or regex queries (db.users.find({"name": /ABC/}).

One thing that is not clear is why do you want store MD5 of the name column instead of creating normal index on name column itself. May be that will help in arriving at the answer.

Upvotes: 0

Sarath Nair
Sarath Nair

Reputation: 2868

Javascript already has md5 hash function called hex_md5. Its available in mongo console as well.

> hex_md5('john')
527bd5b5d689e2c32ae974c6229ff785

So to update records in your case you can use the following code snippet in mongo console:

db.collection.find().forEach( function(data){
  data.hashValue = hex_md5(data.name);
  db.collection.save(data);
});

Upvotes: 19

Related Questions