MongoDB custom sort order for a query with pagination

I have some documents in a MongoDB collection with this schema:

{
    "_id": {
        "$oid": "60c1e8e318afd80016ce58b1"
    },
    "searchPriority": 1,
    "isLive": false,
    "vehicleCondition": "USED",
    "vehicleDetails": {
        "city": "Delhi"
    }
},
{
    "_id": {
        "$oid": "60c1f2f418afd80016ce58b5"
    },
    "searchPriority": 2,
    "isLive": false,
    "vehicleCondition": "USED",
    "vehicleDetails": {
        "city": "Delhi"
    }
},
{
    "_id": {
        "$oid": "60cb429eadd33c00139d2be7"
    },
    "searchPriority": 1,
    "isLive": false,
    "vehicleCondition": "USED",
    "vehicleDetails": {
        "city": "Gurugram"
    }
},
{
    "_id": {
        "$oid": "60c21be618afd80016ce5905"
    },
    "searchPriority": 2,
    "isLive": false,
    "vehicleCondition": "USED",
    "vehicleDetails": {
        "city": "New Delhi"
    }
},
{
    "_id": {
        "$oid": "60e306d29e452d00134b978f"
    },
    "searchPriority": 3,
    "isLive": false,
    "vehicleCondition": "USED",
    "vehicleDetails": {
        "city": "New Delhi"
    }
}

vehicleCondition can be NEW or USED, isLive can be true or false and searchPriority will be an integer between 1 to 3. (lower number means it should be higher in search result)

Here, except _id none of the other fields are unique. I have created a compound index on isLive, vehicleDetails.city and searchPriority.

In my application I will perform some queries of this form:

For this, I can do a find query like this:

db.collection.find({"isLive": true, "vehicleDetails.city": { $in: [ "Gurugram", "Delhi", "New Delhi" ] }, "vehicleCondition": "USED" }, {})

I want the results of this query sorted in this order:

How can I do this? Since the number of documents returned by this query could be very large, I will be using pagination to limit the number of returned documents. Will this extra requirement have any effect on the possible solution for this problem?

Upvotes: 5

Views: 1745

Answers (3)

Dr. Selva Mary G
Dr. Selva Mary G

Reputation: 688

I hope this would work for you

 let x = ["Gurugram","Delhi", "New Delhi"];

db.collection.aggregate([
        {
            $match: {
                "isLive": true,
                "vehicleDetails.city": {
                    $in: x
                },
                "vehicleCondition": "USED"
            }
        },
        {
            $project: {
                "_id": 1,
                "searchPriority": 1,
                "isLive": 1,
                "vehicleCondition": 1,
                "vehicleDetails": 1,
                index: { $indexOfArray: [x, "$vehicleDetails.city"] },
            }
        },
        { $sort: { index: 1, searchPriority: 1 } },
        {
            $project: {
                "index": 0,
            }
        }
    ]).toArray();

Upvotes: 1

Tom Slabbaert
Tom Slabbaert

Reputation: 22316

So I've read the other answer ( which gives a technical solution ) however based on your comments and request it is not suitable.

So firstly using aggregate here while technically solves the problem has some issues.

As you mentioned the query can have a high amount of documents matching, the aggregation pipeline unlike the find method does indeed load ALL of them into memory, this will inventively cause performance issue's, I also saw you mention something about not having an index. this will cause a "collection" scan for every single API call.

What I recommend you do is:

  1. First you absolutely must build a compound index on isLive, vehicleCondition, "vehicleDetails.city" in case you don't have one. this is simply a must for high scale usage.

  2. Now that we got that out of the way I recommend you split your call into into several pieces, I'm going to paste some puesdo code that might look a little all over the places but I do believe this is the best approach you can achieve using Mongo as each of these queries is suppose to be efficient by using the previously built index.

I will briefly explain the methodology, We want to be able to query each city independently of the other cities so we can use the "custom sort" function without needing to load all the matches into memory.

To do so we need to know how much each city needs to "skip" and "limit", as city #2 (Delhi) limit for example will be ( limit - city#1 (Gurugram) matches ).

So here is the pseudo code, I left it simple on purpose so it will be understandable. I will however add some ideas at the end for some basic improvements.

let limit = 10; // determined by req?
const skip = 0; // determined by req?
const cities = ['Gurugram', 'Delhi', 'New Delhi'];

// we need this to resolve the proper skip / limit. the last city is not relevant.
const countPromises = [];
for (let i = 0; i < cities.length - 1; i++) {
    countPromises.push(db.collection.countDocuments({
        'isLive': true,
        'vehicleDetails.city': cities[i],
        'vehicleCondition': 'USED',
    }));
}
await Promise.all(countPromises);

// first city initial skip
const citySkips = [skip];

for (let i = 0; i < countPromises.length - 1; i++) {
    // if we have x results in the first city then we need to skip-x skipping for the next city.
    citySkips.push(Math.max(skip - countPromises[0], 0));
}

let finalResults = [];
for (let i = 0; i < cities.length; i++) {
    // assuming we skip over ALL city i results.
    if (citySkips[i] >= countPromises[i]) {
        continue;
    }
    const cityLimit = limit - finalResults.length;
    if (cityLimit <= 0) {
        break;
    }
    const cityResults = await db.collection.find({
        'isLive': true,
        'vehicleDetails.city': cities[i],
        'vehicleCondition': 'USED',
    }).sort({ sortPriority: 1 }).skip(citySkips[i]).limit(cityLimit);
    finalResults = finalResults.concat(cityResults);
}

Ok so possible improvements you can make:

  • If the database doesn't get updated too often / you don't care about extreme accurateness you can pre-calculate each city match count in advance ( once a day? once a week? depending on your app ). This will speed up the countDocuments part which is needed to determine the skip and limit of each of the cities.
  • The last for loop can be a Promise.all similar to the count to speed up results. Again if the number of cities if never too too high this could be a good solution.
  • Lastly it wasn't clear if a single vehicle can be related to more than 1 city, but if it is the case then you need to add an exclude condition on documents that are already matched.

Upvotes: 1

s7vr
s7vr

Reputation: 75984

You could add a custom sort order field after the match to sort the fields based on the input city criteria order and search priority order. Since it is calculated field index will not be used for sorting.

You can add pagination at the end just you would for any other query.

Something like

db.collection.aggregate([
  {
    $match: {
      "isLive": true,
      "vehicleDetails.city": {
        $in: [
          "Gurugram",
          "Delhi",
          "New Delhi"
        ]
      },
      "vehicleCondition": "USED"
    }
  },
  {
    "$addFields": {
      "cityIndex": {
        "$indexOfArray": [
          [
            "Gurugram",
            "Delhi",
            "New Delhi"
          ],
          "$vehicleDetails.city"
        ]
      }
    }
  },
  {
    $sort: {
      cityIndex: 1,
      sortPriority: 1
    }
  },
  {
    $project: {
      cityIndex: 0
    }
  }
])

Working example can be found here - https://mongoplayground.net/p/16-YOkPotLX

Upvotes: 1

Related Questions