Paul
Paul

Reputation: 99

MongoDB is going back to matching among all documents after $group stage

So I have a collection looking like this:

[
   {"url":"website.com/test", "links":[ {"url": "www.something.fr/page.html","scoreDiff": 0.44} ], "justUpdated": true, "score": 0.91},
   {"url":"domain.com/", "links":[], "justUpdated": true, "score": 0.81},
   {"url":"www.something.fr/page.html", "links":[], "justUpdated": false, "score": 0.42},
]

The goal here is to get the third document, because in one of the documents where "justUpdated" equals true (the first one here), there is its url as a value in one of the "links" array elements.

To achieve that, I tried:

To find all the documents with "justUpdated" equals to true, then in NodeJS concatenate all the urls in their "links" arrays (let's call this array urlsOfInterest). And finally do another query to find all the documents where the url is in urlsOfInterest.

The problem is that it takes some time to do the first query then process the result and do the second query.

So I thought maybe I could do it all at once in an aggregate query. I use $group (with $cond to check if justUpdated equals true) to get all the arrays of "links" in one new variable named urlsOfInterest. For now this is an array of arrays of object so I then use $project with $reduce to have all these {url: "...", score: X} objects as one big array. Finally I use $project and $map to only have the url as the score value doesn't interest me here.

So I get an output looking like this:

   _id:"urlsOfInterest",
   urlsOfInterest: ["www.something.fr/page.html"]

Which is pretty great but I am stuck because now I just need to get the documents where url is in this variable named urlsOfInterest except I can't because all my documents have "disappeared" after the $group stage.

Please help me to find a way to perform this final query :) Or if this isn't the right way to do this, please point me in the right direction !

PS: the real goal here would be to update for all the documents where justUpdated equals true, every scoreDiff values in their links array. For our exemple, we do abs(0.91 - 0.42) = 0.49 so we replace our scoreDiff value of 0.44 by 0.49 (0.91 being the score of the document where justUpdated equals true and 0.42 the score of the document where url equals www.something.fr/page.html, explaining why I need to fetch this last document.) I don't believe there could be a way of doing all of this at once but if there is, please tell me !

Upvotes: 0

Views: 78

Answers (1)

Alex Blex
Alex Blex

Reputation: 37048

You can use $lookup to get all matching documents in an array:

db.collection.aggregate([
  {
    "$match": {
      "justUpdated": true
    }
  },
  {
    "$lookup": {
      "from": "collection",
      "localField": "links.url",
      "foreignField": "url",
      "as": "result"
    }
  },
  {
    "$match": {
      "result": {
        $gt: []
      }
    }
  }
])

Then either $unwind and $replaceRoot the results array to get the documents as a cursor and do the math on the application level or do the calculations in the same pipeline, e.g. with $reduce

The "PS: the real goal" is not quite clear as it is based on a particular example but if you play a little bit with it in the playground I am sure you can calculate the numbers as per your requirements.

Upvotes: 2

Related Questions