Maximilian Stroh
Maximilian Stroh

Reputation: 1086

MongoDB: find documents that match the most tags

In my meteor app, I have a huge collection of documents, each with a field tags, basically like this:

{..., tags: ["a","b","c"], ...},
{..., tags: ["a","b","d"], ...},
{..., tags: ["b","c","e"], ...},
{..., tags: ["x","y","z"], ...},
....

Now i want to query the collection on the server with some tags, eg: ["a","d","y"] and get all results that match at least one tag, and the resultset sorted by the number of matching tags. So, in the exampleset the result should be:

{..., tags: ["a","b","d"], ...},
{..., tags: ["a","b","c"], ...},
{..., tags: ["x","y","z"], ...}

because the first doc has two matches, "a" and "d", and the other two elements have one match, "a" and "y".

Currently I know that I can use $in to match all documents that have at least one match, $all to get all documents where every tag matches, but this doesn't cut it somehow. I could also use mongoDB's aggregate framework if needed.

What would the needed query look like?

Upvotes: 5

Views: 2084

Answers (1)

BatScream
BatScream

Reputation: 19700

I could also use mongoDB's aggregate framework if needed.

You need to use the the aggregation pipeline, which can be written as below:

  • Match the documents having at least one matching value in the tags array.
  • We will be unwinding and working on the tags array, so keep a copy of the tags array in each record.
  • Unwind the tags array.
  • Match the records which have their tags value present in the input array.
  • Group by the _id field and calculate the number of documents that have matched.
  • Sort the groups based on their number of matches.
  • project the required fields along with the original tags array copy we had created.

Code:

var inp = ["a","d","y"];

db.collection.aggregate([
{$match:{"tags":{$in:inp}}},
{$project:{"tagsCopy":"$tags","tags":1}},
{$unwind:"$tags"},
{$match:{tags:{$in:inp}}},
{$group:{"_id":"$_id","noOfMatches":{$sum:1},"tags":{$first:"$tagsCopy"}}},
{$sort:{noOfMatches:-1}},
{$project:{"_id":0,"noOfMatches":1,tags:1}} //remove noOfMatches and 
                                            //add other required 
                                            //fields which are necessary.
])

o/p:

{ "noOfMatches" : 2, "tags" : [ "a", "b", "d" ] }
{ "noOfMatches" : 1, "tags" : [ "x", "y", "z" ] }
{ "noOfMatches" : 1, "tags" : [ "a", "b", "c" ] }

Upvotes: 6

Related Questions