Reputation: 209
In my current project we use Mongo for storing a lot of documents (approximately 100Bln). How do I remove a half of oldest documents using field _id, because if I use indexed field "timestamp" this operation will be completed after ~3 years with current speed.
Upvotes: 5
Views: 3454
Reputation: 529
Simply find the middle _id and remove all the older entries:
Mongo shell:
// get total documents count / 2
var c = Math.floor( db.collection.stats()['count'] / 2 )
// find middle id
var mid_id = db.collection.find().skip(c).limit(1)[0]._id
// remove all ids older than the middle one
db.collection.remove({_id:{$lt:mid_id}})
Upvotes: 5
Reputation: 5548
Here is a link to a MongoDB-User Google Groups post that discusses generating ObjectIds based on time stamps: http://groups.google.com/group/mongodb-user/browse_thread/thread/262223bb0bd52a83/3fd9b01d0ad2c41b
From the post: Extracting the time stamp from Mongo ObjectIds is explained in the Mongo Document "Optimizing Object IDs" http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs#OptimizingObjectIDs-Extractinsertiontimesfromidratherthanhavingaseparatetimestampfield.
Taken from the example in the post, ObjectIds may be created from the time in seconds in Unix time:
> now = new Date()
ISODate("2012-04-19T19:01:58.841Z")
> ms = now.getTime()
1334862118841
> sec = Math.floor(ms/1000)
1334862118
> hex = sec.toString(16)
4f906126
> id_string = hex + "0000000000000000"
4f9061260000000000000000
> my_id = ObjectId(id_string)
ObjectId("4f9061260000000000000000")
Using the above formula, you can create an ObjectID from any date, and query for documents with lesser ObjectIds.
Going forward, if your application will be saving data based on time and deleting data once it reaches a certain age, you may find it preferable to store your documents in separate collections; one for each day, week, or whatever time frame makes the most sense for your application. Dropping an entire collection requires a lot less overhead than removing individual documents, because it can be done with a single operation. db.<collection>.remove({query})
will perform a write operation for each document returned, which as you have observed may be prohibitively slow for a large number of documents.
Upvotes: 6