Nate Vaughan
Nate Vaughan

Reputation: 3839

Iterating over MongoDB collection to duplicate all documents is painfully slow

I have a collection of 7,000,000 documents (each of perhaps 1-2 KB BSON) in a MongoDB collection that I would like to duplicate, modifying one field. The field is a string with a numeric value, and I would like to increment the field by 1.

Following this approach From the Mongo shell, I took the following approach:

> var all = db.my_collection.find()
> all.forEach(function(it) { 
... it._id = 0; // to force mongo to create a new objectId
... it.field = (parseInt(it.field) + 1).toString();
... db.my_collection.insert(it);
... })

Executing the following code is taking an extremely long time; at first I thought the code was broken somehow, but from a separate terminal I checked the status of the collection something like an hour later to find the process was still running and there was now 7,000,001 documents! I checked to find that sure enough, there was exactly 1 new document that matched the incremented field.

For context, I'm running a 2015 MBP with 4 cores and 16 GB ram. I see mongo near the top of my CPU overhead averaging about 85%.

1) Am I missing a bulk modify/update capability in Mongodb?

2) Any reason why the above operation would be working, yet working so slowly that it is updating a document at a rate of 1/hr?

Upvotes: 0

Views: 234

Answers (2)

Aakash Verma
Aakash Verma

Reputation: 3994

Try the db.collection.mapReduce() way:

NB: A single emit can only hold half of MongoDB’s maximum BSON document size.

var mapFunction1 = function() {
                       emit(ObjectId(), (parseInt(this.field) + 1).toString());
                   };

MongoDB will not call the reduce function for a key that has only a single value.

var reduceFunction1 = function(id, field) {
                          return field;
                      };

Finally,

db.my_collection.mapReduce(
mapFunction1,
reduceFunction1.
{"out":"my_collection"} //Replaces the entire content; consider merge
)

Upvotes: 1

Nate Vaughan
Nate Vaughan

Reputation: 3839

I'm embarrassed to say that I was mistaken that this line:

... it._id = 0; // to force mongo to create a new objectId

Does indeed force mongo to create a new ObjectId. Instead I needed to be explicit:

... it._id = ObjectId();

Upvotes: 0

Related Questions