How to filter efficiently in mongodb

Question

I have example documents like this,

{year:2015, brand:"mq", model:"ma"},
{year:2015, brand:"mq", model:"mb"},
{year:2015, brand:"mq", model:"mc"}

Lets query documents with this

{year:2015, brand:"mq", model:"ma"}

After this query I only get one result as you can imagine, but what I would like to get as a result is this,

{year:2015, brand:"mq", model:"ma"}, -> first result that matches the most
{year:2015, brand:"mq", model:"mb"}, -> second
{year:2015, brand:"mq", model:"mc"}  -> third

I have searched this a lot, but I could not find anything related, I have a script that I wrote to get those results also, but It is not efficient when it comes to index usage, how can I manage this problem of mine ? thank you

Blakes Seven · Accepted Answer

A text search and index would be the most effient in terms of basic query to do this.

So if you create a text index like this:

 db.collection.createIndex({ "year": "text", "brand": "text", "model": "text" })

And then issue a query like this:

db.junk.find({ "$text": { "$search": "2015 mq ma" } })

It would return results like this:

{
        "_id" : ObjectId("55cc56d5ca1af2140f5e283c"),
        "year" : 2015,
        "brand" : "mq",
        "model" : "ma",
        "score" : 2.2
}
{
        "_id" : ObjectId("55cc56d5ca1af2140f5e283d"),
        "year" : 2015,
        "brand" : "mq",
        "model" : "mb",
        "score" : 1.1
}
{
        "_id" : ObjectId("55cc56d5ca1af2140f5e283e"),
        "year" : 2015,
        "brand" : "mq",
        "model" : "mc",
        "score" : 1.1
}

Which are already ranked according to the number of terms in the search that were matched over the fields specified in the index. Note the numeric type does not matter and is treated as "text" by the search itself.

Or, if you are worried about the additional space used by a text index, then you can always manually handle this with the .aggregate() method. There is some overhead in the calculation of a "score", but it is reasonably minimal:

db.collection.aggregate([
  { "$match": {
    "$or": [
      { "year": 2015 },
      { "brand": "mq" },
      { "model": "ma" }
    ]
  }},
  { "$project": {
    "year": 1, 
    "brand": 1,
    "model": 1,
    "score": {
      "$add": [
        { "$cond": [ { "$eq": [ "$year", 2015 ] }, 1, 0] },
        { "$cond": [ { "$eq": [ "$brand", "mq" ] }, 1, 0 ] },
        { "$cond": [ { "$eq": [ "$model", "ma" ] }, 1, 0 ] }
      ]
    }
  }},
  { "$sort": { "score": -1 } }
])

Which produces a similar result:

{
        "_id" : ObjectId("55cc56d5ca1af2140f5e283c"),
        "year" : 2015,
        "brand" : "mq",
        "model" : "ma",
        "score" : 3
}
{
        "_id" : ObjectId("55cc56d5ca1af2140f5e283d"),
        "year" : 2015,
        "brand" : "mq",
        "model" : "mb",
        "score" : 2
}
{
        "_id" : ObjectId("55cc56d5ca1af2140f5e283e"),
        "year" : 2015,
        "brand" : "mq",
        "model" : "mc",
        "score" : 2
}

But the basic methods you want to follow is something that adds a "weight" to your matched terms, and then sorts the results accordingly.

As for which to choose, it all depends on what you want to do really. Text indexes will automatically assign a weight which can be used with the $meta property in sort as shown. You can even asssign "weight" to each field individually when creating the index, so some fields will affect results more than others if required.

By that same token, assigning this yourself within an aggregation process allows a bit more control, and you could assign different weights to different queries if you want and you can make up more complex rules as well.

With a "text" index, this is fairly "static" as these rules are defined on the index itself. Additionally text indexes require a bit more space than normal indexes, especially if you have other queries when normal indexes would suffice.

Choosing "which is best" is really about what your application is doing with the data, and the types of queries you regularly perform. As always, "benchmark" and observe both application and server behavior under all conditions. That usually gives you the best resource for which option suits you best.

How to filter efficiently in mongodb

Answers (1)

Related Questions