user2988332
user2988332

Reputation: 115

How to weight documents to create sort criteria?

I'm trying to aggregate a collection in which there are documents that look like this:

[
  {  
    "title" : 1984,
    "tags" : ['dystopia', apocalypse', 'future',....]
  },
  ....
]

And I have a criteria array of keywords, for instance:

var keywords = ['future', 'google', 'cat',....]

What I would like to achieve is to aggregate the collection in order to $group it according to a "convenience" criteria in order to sort the document by the one that contains the more of the keywords in its tags field.

This means, if one document contains in its tags: 'future', 'google', 'cat' it will be sorted before another one that has 'future', 'cat', 'apple'.

So far, I have tried something like this:

db.books.aggregate(
   { $group : { _id : {title:"$title"} , convenience: { $sum: { $cond: [ {tags: {$in: keywords}}, 1, 0 ] } } } },
            { $sort : {'convenience': -1}})

But the $in operator is not a boolean so it does not work. I've looked around and didn't find any operator that could help me with this.

Upvotes: 0

Views: 148

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151132

As you said you need a logical operator in order to evaluate $cond. It's a bit terse, but here is an implementation using $or :

db.books.aggregate([
    {$unwind: "$tags" },
    {$group: {
        _id: "$title",
        weight: {
            $sum: {$cond: [
               // Test *equality* of the `tags` value against any of the list 
               {$or: [
                   {$eq: ["$tags", "future"]},
                   {$eq: ["$tags", "google"]},
                   {$eq: ["$tags", "cat"]},
               ]},
            1, 0 ]}
        }
    }}
])

I'll leave the rest of the implementation up to you, but this should show the basic construction to the point of the matching you want to do.

Addition

From your comments there also seem to be a programming issue you are struggling with, related to how you perform an aggregation like this where you have an Array of items to query in the form you gave above:

var keywords = ['future', 'google', 'cat',....]

Since this structure cannot be directly employed in the pipeline condition, what you need to do is transform it into what you need. Each language has it's own approach, but in a JavaScript version:

var keywords = ['future', 'google', 'cat'];
var orCondition = [];

keywords.forEach(function(value) {
    var doc = {$eq: [ "$tags", value ]};
    orCondition.push(doc);
});

And then just define the aggregation query with the orCondition variable in place:

db.books.aggregate([
    {$unwind: "$tags" },
    {$group: {
        _id: "$title",
        weight: {
            $sum: {$cond: [
               {$or: orCondition }
            1, 0 ]}
        }
    }}
])

Or for that matter, any of the parts you need to construct. This is generally how it is done in the real world, where we would almost never hard-code a data structure like this.

Upvotes: 1

Related Questions