Reputation: 5606
Issue: I currently have a mongo collection with 100,000 documents. Each document has 3 fields (_id, name, age). I want to add a 4th field to each document called hashValue that stores the md5 hash value of each documents name field.
I currently can interact with my collection via the mongo shell or via Mongoose ODM as part of a nodeJS app.
Possible Solutions:
I realize this won't work (don't believe you can iterate through a cursor in this manner), but hopefully it shows what I'm trying to do.
var crypto = require('crypto');
MyCollection.find().forEach(function(el){
var hash = crypto.createHash('md5').update(el.name).digest("hex");
el.name = hash;
el.save()
});
Use mongo Shell - Almost same as above, and I realize something like the above syntax would work. Only issue is that I don't know how to create the md5 hash in the mongo shell. But I am able to iterate through each document and add a field.
(possible workaround) - The goal of this is to be able to query based off the md5 hash of a name value. I believe mongo allows you to create a hashed index (link here). Only issue is that I can't find an example of anyone using this for querying (only seems to be used for sharding) and I'm not sure if that will work later on. (Example: I want to md5 hash a name I collect from a user, and then query my mongo collection to see if I can find that md5 hash in the hashValue field)
Upvotes: 5
Views: 17738
Reputation: 654
As of now (version 7) you can use hex_md5 inside $function aggregation:
$addFields: {
_md5: {
$function: {
body: function(token1, currency) {
return hex_md5(token1 + "_" + currency);
},
args: ["$_token1", "$_currency"],
lang: "js"
}
},
}
Upvotes: 2
Reputation: 4055
You can iterate through cursor in mongoose using streams and update all the records using bulk.
mongoose.connection.on("open", function(err,conn) {
var bulk = MyCollection.collection.initializeUnorderedBulkOp();
MyCollection.find().stream()
.on('data', function(el){
var hash = crypto.createHash('md5').update(el.name).digest("hex");
// add document update operation to a bulk
bulk.find({'_id': el._id}).update({$set: {name: hash}});
})
.on('error', function(err){
// handle error
})
.on('end', function(){
// execute all bulk operations
bulk.execute(function (error) {
// final callback
callback();
});
});
});
Upvotes: 1
Reputation: 8247
I personally would not prefer to go with option 3 (i.e., Possible workaround). Tow reasons - 1. When querying the data we have to make sure that application uses the same hash function and in the same way, as that of Mongo DB, to derive the hash value. I think Mongo DB uses MD5 and considers only the first 64 bits of hash. The disadvantage I see is the application gets tied to the internal implementation of Mongo DB hashing and could change at any point.
One thing that is not clear is why do you want store MD5 of the name column instead of creating normal index on name column itself. May be that will help in arriving at the answer.
Upvotes: 0
Reputation: 2868
Javascript already has md5 hash function called hex_md5. Its available in mongo console as well.
> hex_md5('john')
527bd5b5d689e2c32ae974c6229ff785
So to update records in your case you can use the following code snippet in mongo console:
db.collection.find().forEach( function(data){
data.hashValue = hex_md5(data.name);
db.collection.save(data);
});
Upvotes: 19