Rafa Acioly
Rafa Acioly

Reputation: 636

How can i sort before finding and limiting in a collection?

Which method of pymongo executes first? sort or limit?

I've a big collection where i need to filter for a specific discount category ("X"), e.g:

{"discount_value": 1, "name": "promotion_1", "category": ["X"]},
{"discount_value": 10, "name": "promotion_10", "category": ["X", "Y"]},
{"discount_value": 15, "name": "promotion_15", "category": ["X", "Y", "Z"]}

But the query must filter all documents by discount_values retrieving the bigger ones, lets say that i need to limit the query by 2:

.find({"category": {"$in": ["X"]}})
.limit(2)
.sort("discount_value", pymongo.DESCENDING)

This example will give me discount value 1 and 10 but what i needed was 10 and 15, how can i sort before limiting (if possible) without losing performance (the collection is really big)

Upvotes: 2

Views: 2149

Answers (1)

zedfoxus
zedfoxus

Reputation: 37059

sort() gets applied first and then limit.

Scroll down to this doc: https://docs.mongodb.com/manual/reference/method/db.collection.find/

The following statements chain cursor methods limit() and sort():

db.bios.find().sort( { name: 1 } ).limit( 5 )

db.bios.find().limit( 5 ).sort( { name: 1 } )

The two statements are equivalent; i.e. the order in which you chain the limit() and the sort() methods is not significant. Both statements return the first five documents, as determined by the ascending sort order on ‘name’.

Sort order testing

Sorting actually works correctly for me. Using MongoDB shell version v3.6.3 on Ubuntu 18.04 server, I loaded a file like this:

[{"discount_value": 1, "name": "promotion_1", "category": ["X"]},
{"discount_value": 10, "name": "promotion_10", "category": ["X", "Y"]},
{"discount_value": 15, "name": "promotion_15", "category": ["X", "Y", "Z"]}]

using mongoimport --db test --collection test1 --drop --file testing.txt --jsonArray

From mongo prompt, I tried to find and sort on discount_value descending and saw that 15 was on the top.

> db.test1.find({"category": {"$in": ["X"]}}).sort( {discount_value: -1} )
{ "_id" : ObjectId("5cb4beefea2d524413d8df57"), "discount_value" : 15, "name" : "promotion_15", "category" : [ "X", "Y", "Z" ] }
{ "_id" : ObjectId("5cb4beefea2d524413d8df56"), "discount_value" : 10, "name" : "promotion_10", "category" : [ "X", "Y" ] }
{ "_id" : ObjectId("5cb4beefea2d524413d8df55"), "discount_value" : 1, "name" : "promotion_1", "category" : [ "X" ] }

Limit testing

Notice how limit being before or after sort doesn't a difference in the output.

Limit after sort gives the same result as limit before sort.

> db.test1.find({"category": {"$in": ["X"]}}).sort( {discount_value: -1} ).limit(2)
{ "_id" : ObjectId("5cb4beefea2d524413d8df57"), "discount_value" : 15, "name" : "promotion_15", "category" : [ "X", "Y", "Z" ] }
{ "_id" : ObjectId("5cb4beefea2d524413d8df56"), "discount_value" : 10, "name" : "promotion_10", "category" : [ "X", "Y" ] }

vs.

> db.test1.find({"category": {"$in": ["X"]}}).limit(2).sort( {discount_value: -1} )
{ "_id" : ObjectId("5cb4beefea2d524413d8df57"), "discount_value" : 15, "name" : "promotion_15", "category" : [ "X", "Y", "Z" ] }
{ "_id" : ObjectId("5cb4beefea2d524413d8df56"), "discount_value" : 10, "name" : "promotion_10", "category" : [ "X", "Y" ] }

Sorting performance

The best I can tell you is to use indexes as suggested in their manual https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/ and use explain to understand where query bottlenecks exists for your kind of workload with https://docs.mongodb.com/manual/reference/method/db.collection.explain/#db.collection.explain

Upvotes: 2

Related Questions