Reputation: 8071
We have around 20 million records in our mongodb. In my collection called 'posts' there is a field called 'id' which was supposed to be unique but now it has gotten all messed up. We just want it to be unique and there are many many duplicates now.
We just wanted to do something like iterating over every reocrd and assigning it a unique id in a loop from 1 to 20million.
What would be the easiest way to do this?
Upvotes: 0
Views: 159
Reputation: 230346
There are not many options here, really.
Pick your language and driver of choice.
Fetch N documents.
Assign unique ids to them (several options here: 1) copy _id; 2) assign new ObjectID; 3) assign plain integer)
Save those documents.
Fetch next N documents. Go to step 3.
To fetch next N documents, you should note the last processed document's _id and do this:
db.collection.find({_id: {$gt: last_processed_id}}).sort({_id: 1}).limit(N);
Do not use skip here. It will be too slow.
And, of course, you can always truncate the collection, create unique index on id
and populate it again.
Upvotes: 2
Reputation: 66237
You can use a simple script like this:
db.posts.dropIndex("*id index name here*"); // Drop Unique index
counter = 0;
page = 1;
slice = 1000;
total = db.posts.count();
conditions = {};
while (counter < total) {
cursor = db.posts.find(conditions, {_id: true}).sort({_id: 1}).limit(slice);
while (cursor.hasNext()) {
row = cursor.next();
db.posts.update({_id: row._id}, {$set: {id: ++counter}});
}
conditions['_id'] = {$gt: row._id};
print("Processed " + counter + " rows");
}
print('Adding id index');
db.posts.ensureIndex({id: 1}, {unique: true, background: false});
print("done");
save it to assignids.js, and run as
$ mongo dbname assignids.js
the outer-while selects 1000 rows as a time, and prevents cursor timeouts; the inner while assigns each row a new incremental id.
Upvotes: 1