Using weights for searching in mongoose

Question

So I've read through this but still am a little bit confused on how to go about this.

My model contains various fields that are Strings, Numbers and Boolean values.

$text seems like it can only take in strings.

What if I wanted to do a search like:

model.find({petsAllowed:true, rooms:4, house:"townhouse"}).sort()

So have it search for all the different entries in mongodb that match with what I'm inputting in AND sort it based on how close the entry is to the inputted fields.

I know mongoose supports this so I don't want to rely on a plugin.

Here's the result I want:

[ 
Document 1 (most closely matched with the input): 
    {petsAllowed:true, rooms:4, house:"townhouse"},
Document 2: {petsAllowed:false, rooms:4, house:"townhouse"},
Document 3: {petsAllowed:true, rooms:5, house:"townhouse"},
Document 4: {petsAllowed:false, rooms:3, house:"townhouse"}
]

Neil Lunn · Accepted Answer

In order to "weight" a response then the basic principle is you have to determine which parts of a result are more important to the search you are performing and essentially provide an appropriate score by order of importance of the result due to your rules.

This is really a MongoDB thing and not an externally coded thing as you need to analyse the results on the server, especially when you are considering something like "paging" the weighted results when there is a lot of them. To do this on the server you need the .aggregate() method.

Working through this I already had my own data sample while waiting for your input, but it still serves as example. Considering this initial sample.

{ "petsAllowed" : true,  "rooms" : 5, "type" : "townhouse" }
{ "petsAllowed" : false, "rooms" : 4, "type" : "house"     }
{ "petsAllowed" : true,  "rooms" : 4, "type" : "townhouse" }
{ "petsAllowed" : false, "rooms" : 4, "type" : "townhouse" }
{ "petsAllowed" : true,  "rooms" : 2, "type" : "townhouse" }
{ "petsAllowed" : true,  "rooms" : 3, "type" : "townhouse" }
{ "petsAllowed" : true,  "rooms" : 4, "type" : "house"     }

So that also includes a "type" where we are also going to be "fuzzy" in the match and not just determine "exact" matches. Using the aggregation pipeline and setting up the logic from your inputs is basically like this:

 var roomsWanted = 4,
     exact = "townhouse",
     types = [];

 // Some logic to get the "fuzzy" values
 var fuzzy = [/house/]

 // Combine exact and fuzzy    
 types.push(exact);
 fuzzy.forEach(function(fuzz) {
     types.push(fuzz);
 });

 // Perform the query
 db.houses.aggregate([
     // Match items you want and exclude others
     { "$match": { 
         "type": { "$in": types }, 
         "$or": [
             { "rooms": { "$gte": roomsWanted } },
             { "rooms": roomsWanted - 1 }
         ]
     }},

     // Calculate a score
     { "$project": {
         "petsAllowed": 1,
         "rooms": 1,
         "type": 1,
         "score": {
             "$add": [
                 // Exact match is higher than the fuzzy ones
                 // Fuzzy ones score lower than other possible matches
                 { "$cond": [
                     { "$eq": [ "$type", "townhouse" ] },
                     20,
                     2
                 ]},
                 // When petsAllowed is true you want a weight
                 { "$cond": [
                     "$petsAllowed",
                     10,
                     0
                 ]},
                 // Score depending on the roomsWanted
                 { "$cond": [
                     { "$eq": [ "$rooms", roomsWanted ] },
                     5,
                     { "$cond": [
                         { "$gt": [ "$rooms", roomsWanted ] },
                         4,
                         { "$cond": [
                             { "$eq": [ "$rooms", roomsWanted - 1 ] },
                             3,
                             0
                         ]}
                     ]}
                 ]}
             ]
         }
     }},
     { "$sort": { "score": -1 } },
 ])

The results you get are then sorted by the generated "score" like so:

{ "petsAllowed" : true,  "rooms" : 4, "type" : "townhouse", "score" : 35 }
{ "petsAllowed" : true,  "rooms" : 5, "type" : "townhouse", "score" : 34 }
{ "petsAllowed" : true,  "rooms" : 3, "type" : "townhouse", "score" : 33 }
{ "petsAllowed" : false, "rooms" : 4, "type" : "townhouse", "score" : 25 }
{ "petsAllowed" : true,  "rooms" : 4, "type" : "house",     "score" : 17 }
{ "petsAllowed" : false, "rooms" : 4, "type" : "house",     "score" : 7  }

Breaking that down into what is happening here, the first thing is my own decision that I possibly want anything that contains "house" in the "type" as well as any "exact matches" for the type that was selected. That's arbitrary logic to determine that, but the point is that we are going to consider both in this example.

Of course the search will want to filter out anything that you really don't want, so there is a $match pipeline stage to do this. The $in operator is used to match "type" to either the exact "townhouse" term or to a possible regular expression match of /house/. That's because I want it too, and your mileage may vary on what it is you want to really do.

Also there is a condition to look for the number of rooms. Again a arbitrary decision here is that I will consider both anything with four rooms or greater, hence the **$gte** condition. I also want to consider things that have one less room than was asked for. Arbitrary logic again, but just to demonstrate the point of what you do when you want this.

After $match has done it's "filtering", you move the results to the $project stage. The main point here is that you want a calculated "score" value, but you also must specify all of the fields you want to return when using this pipeline stage.

Here is where you have made some choices over which "weight" to apply to conditions. The $add operator will "sum" results that are given as it's arguments, which are in turn produced by the $cond or "conditional" operator.

This is a "ternary" operator, in that it evaluates a logical "if" condition as the first argument, then either returns the true second argument or the false third argument. Like any ternary, when you want to test different conditions to "nest" the operators within the false argument in order to "flow through" them.

Once a "score" has been determined you $sort the results in the order of the largest "score" first.

Implementing paging can be done in either a traditional form by adding $skip and $limit pipeline stages at the end of the pipeline, or by more involved "forward paging" by keeping the last value(s) seen and excluding those from the results looking for a "score" $lte the last "score" that was seen. That's another topic in itself, but it all depends on what sort of paging concept suits your application the best.

Of course for some of the logic like "petsAllowed", you only want those conditions to calculate a weight when it's actually valid to the selection criteria you want. Part of the beauty of the aggregation pipeline syntax and indeed all MongoDB query syntax is that whatever the language implementation it is basically just a representation of a "data structure". So you can just "build" the pipeline stages as required from your inputs as you would any data structure in code.

Those are the principles, but of course everything comes at a cost and calculating these weightings "on the fly" is not just a simple query where the values can be looked up in an index.

Using weights for searching in mongoose

Answers (1)

Related Questions