MongoDB/Mongoose weight records with non-empty field

Question

I have a MongoDB collection of documents. I've already assigned weights to specific fields, but I need to weight records with any non-empty name to the top. I don't want to sort by the name, I'd just like records with a name to appear before any without one.

An example schema:

new Schema({
  slug: {
    type: String,
    index: {unique: true, dropDups: true}
  },
  name: String,
  body: {
    type: String,
    required: true
  }
});

Example index:

MySchema.index({
    name:'text',
    body:'text'
}, {
    name: 'best_match_index',
    weights: {
      name: 10,
      body: 1
    }
});

The find query:

MyModel.find( criteria, { score : { $meta: 'textScore' } })
  .sort({ score : { $meta : 'textScore' } })
  .skip(offset)
  .limit(per_page)

Neil Lunn · Accepted Answer

If I understand your meaning here, what you are saying is that given documents like this:

{ "name" : "term", "body" : "unrelated" }
{ "name" : "unrelated", "body" : "unrelated" }
{ "body" : "term" }
{ "body" : "term term" }
{ "name" : "unrelated", "body" : "term" }

A normal search for "term" would produce results like this:

{ "name" : "term", "body" : "unrelated", "score" : 11 }
{ "body" : "term term", "score" : 1.5 }
{ "body" : "term", "score" : 1.1 }
{ "name" : "unrelated", "body" : "term", "score" : 1.1 }

But what you would want is it get the last entry as the second entry.

For this you need a "dynamic" projection of another field to "weight" on which is where you would use the aggregation framework:

MyModel.aggregate([
    { "$match": {
        "$text": { "$search": "term" } 
    }},
    { "$project": {
        "slug": 1,
        "name": 1,
        "body": 1,
        "textScore": { "$meta": "textScore" },
        "nameScore": { 
            "$cond": [
                { "$ne": [{ "$ifNull": [ "$name", "" ] }, ""] },
                1,
                0
            ]
        }
    }},
    { "$sort": { "nameScore": -1, "textScore": -1 } },
    { "$skip": offset },
    { "$limit": per_page }
],function(err,results) {
    if (err) throw err;

    console.log( results );
})

Which places the items with a "name" field above those without:

{ "name" : "term", "body" : "unrelelated", "textScore" : 11, "nameScore" : 1 }
{ "name" : "unrelated", "body" : "term", "textScore" : 1.1, "nameScore" : 1 }
{ "body" : "term term", "textScore" : 1.5, "nameScore" : 0 }
{ "body" : "term", "textScore" : 1.1, "nameScore" : 0 }

Essentially the $ifNull operator within the $cond ternary tests for the presence of the "name" field, and then either returns 1 where present or 0 where not present.

This is passed to the $sort pipeline where your sort is on the "nameScore" first to float those items to the top and then the "textScore".

The aggregation pipeline has it's own implementations of $skip and $limit for use in paging.

This is essentially the same set of operations as in the .find() implementation, with "match", "project", "sort", "skip" and "limit". So there really is no difference in how this is processed, but just with a little more control over the results.

Usage of "skip" and "limit" is not the most performant solution really, but sometimes you are stuck with it, as in situations where to need to provide "page numbering" for example. But if you can get away with it and only ever need to move forwards, then you might try tracking the last seen "textScore" and the list of "seen_ids" to a certain level of granularity, depending of how distributed your "textScore" values are. These could be passed in as an alternate approach to "skipping" through the results:

MyModel.aggregate([
    { "$match": {
        "$text": { "$search": "term" }
    }},
    { "$project": {
        "slug": 1,
        "name": 1,
        "body": 1,
        "textScore": { "$meta": "textScore" },
        "nameScore": { 
            "$cond": [
                { "$ne": [{ "$ifNull": [ "$name", "" ] }, ""] },
                1,
                0
            ]
        }
    }},
    { "$match": {
        "_id": { "$nin": seen_ids }
        "textScore": { "$gte": last_score },
    }},        
    { "$sort": { "nameScore": -1, "textScore": -1 } },
    { "$limit": page_size }
])

The only slightly unfortunate thing here is that the $meta for the textScore cannot yet be exposed to the initial $match operation, which would help in narrowing down the results without needing to run through $project first.

So really you cannot do the same full optimization that can be done with things like the specialized $geoNear operator, but a text version of that or allowing the former statement would be nice.

What you may notice here is that the objects returned from an .aggregate() option are just raw JavaScript objects and not the Mongoose "document" objects that are returned from operations like .find(). This is "by design", and the primary reason here is that since the aggregation framework allows you to "manipulate" the resulting documents, there is no guarantee here that those documents are actually the same as what you have in the schema you initially queried.

Since you are not really "changing" or "re-shaping" the documents in your intended purpose, it now just falls back to your code to do what mongoose is doing automatically behind the scenes and "casting" each raw result into the standard "type".

This listing should generally show you what you need to do:

var async = require('async'),
    mongoose = require('mongoose'),
    Schema = mongoose.Schema;

mongoose.connect("mongodb://localhost/test");

var testSchema = new Schema({
  name: String,
  body: { type: String, required: true },
  textScore: Number,
  nameScore: Number
},{
  toObject: { virtuals: true },
  toJSON: { virtuals: true }
});

testSchema.virtual('favourite').get(function() {
  return "Fred";
});

var Test = mongoose.model( "Test", testSchema, "textscore" );

Test.aggregate([
  { "$match": {
    "$text": { "$search": "term" }
  }},
  { "$project": {
    "name": 1,
    "body": 1,
    "textScore": { "$meta": "textScore" },
    "nameScore": {
      "$cond": [
        { "$ne": [{ "$ifNull": [ "$name", "" ] }, "" ] },
        1,
        0
      ]
    }
  }},
  { "$sort": { "nameScore": -1, "textScore": -1 } },
],function(err,result) {
  if (err) throw err;

  result = result.map(function(doc) {
    return new Test( doc );
  });
  console.log( JSON.stringify( result, undefined, 4 ));
  process.exit();

});

Which includes the "virtual" field in the output:

[
    {
        "_id": "53d1a9b501e1b6c73aed2b52",
        "name": "term",
        "body": "unrelelated",
        "favourite": "Fred",
        "id": "53d1a9b501e1b6c73aed2b52"
    },
    {
        "_id": "53d1ae1a01e1b6c73aed2b56",
        "name": "unrelated",
        "body": "term",
        "favourite": "Fred",
        "id": "53d1ae1a01e1b6c73aed2b56"
    },
    {
        "_id": "53d1ada301e1b6c73aed2b55",
        "body": "term term",
        "favourite": "Fred",
        "id": "53d1ada301e1b6c73aed2b55"
    },
    {
        "_id": "53d1ad9e01e1b6c73aed2b54",
        "body": "term",
        "favourite": "Fred",
        "id": "53d1ad9e01e1b6c73aed2b54"
    }
]

MongoDB/Mongoose weight records with non-empty field

Answers (1)

Related Questions