user2710961
user2710961

Reputation: 69

How to efficiently page batches of results with MongoDB

I am using the below query on my MongoDB collection which is taking more than an hour to complete.

db.collection.find({language:"hi"}).sort({_id:-1}).skip(5000).limit(1)

I am trying to to get the results in a batch of 5000 to process in either ascending or descending order for documents with "hi" as a value in language field. So i am using this query in which i am skipping the processed documents every time by incrementing the "skip" value.

The document count in this collection is just above 20 million. An index on the field "language" is already created. MongoDB Version i am using is 2.6.7

Is there a more appropriate index for this query which can get the result faster?

Upvotes: 2

Views: 2604

Answers (2)

Blakes Seven
Blakes Seven

Reputation: 50406

In order to efficiently "page" through results in the way that you want, it is better to use a "range query" and keep the last value you processed.

You desired "sort key" here is _id, so that makes things simple:

First you want your index in the correct order which is done with .createIndex() which is not the deprecated method:

db.collection.createIndex({ "language": 1, "_id": -1 })

Then you want to do some simple processing, from the start:

var lastId = null;

var cursor = db.collection.find({language:"hi"});
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
    // do something with your document. But always set the next line
    lastId = doc._id;
})

That's the first batch. Now when you move on to the next one:

var cursor = db.collection.find({ "language":"hi", "_id": { "$lt": lastId });
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
    // do something with your document. But always set the next line
    lastId = doc._id;
})

So that the lastId value is always considered when making the selection. You store this between each batch, and continue on from the last one.

That is much more efficient than processing with .skip(), which regardless of the index will "still" need to "skip" through all data in the collection up to the skip point.

Using the $lt operator here "filters" all the results you already processed, so you can move along much more quickly.

Upvotes: 1

Philipp
Philipp

Reputation: 69663

When you want to sort descending, you should create a multi-field index which uses the field(s) you sort on as descending field(s). You do that by setting those field(s) to -1.

This index should greatly increase the performance of your sort:

db.collection.ensureIndex({ language: 1, _id: -1 });

When you also want to speed up the other case - retrieving sorted in ascending order - create a second index like this:

db.collection.ensureIndex({ language: 1, _id: 1 });

Keep in mind that when you do not sort your results, you receive them in natural order. Natural order is often insertion order, but there is no guarantee for that. There are various events which can cause the natural order to get messed up, so when you care about the order you should always sort explicitly. The only exception to this rule are capped collections which always maintain insertion order.

Upvotes: 4

Related Questions