Nickpick
Nickpick

Reputation: 6587

Removing duplicates in mongodb with aggregate query

db.games.aggregate([ 
{ $unwind : "$rounds"},
{ $match: {
 "rounds.round_values.gameStage": "River",
 "rounds.round_values.decision": "BetPlus" }
 },
 { $project: {"FinalFundsChange":1, "GameID":1}
    }])

The resulting output is:

{ "_id" : ObjectId("57cbce66e281af12e4d0731f"), "GameID" : "229327202", "FinalFundsChange" : 0.8199999999999998 }
{ "_id" : ObjectId("57cbe2fce281af0f34020901"), "FinalFundsChange" : -0.1599999999999997, "GameID" : "755030199" }
{ "_id" : ObjectId("57cbea3ae281af0f340209bc"), "FinalFundsChange" : 0.10000000000000009, "GameID" : "231534683" }
{ "_id" : ObjectId("57cbee43e281af0f34020a25"), "FinalFundsChange" : 1.7000000000000002, "GameID" : "509975754" }
{ "_id" : ObjectId("57cbee43e281af0f34020a25"), "FinalFundsChange" : 1.7000000000000002, "GameID" : "509975754" }

As you can see the last element is a duplicate, that's because the unwind creates two elements of it, which it should. How can I (while keeping the aggregate structure of the query) keep the first element of the duplicate or keep the last element of the duplicate only?

I have seen that the ways to do it seem to be related to either $addToSet or $setUnion (any details how this works exactly are appreciated as well), but I don't understand how I can choose the 'subset' by which I want to identify the duplicates (in my case that's the 'GameID', other values are allowed to be different) and how I can select whether I want the first or the last element.

Upvotes: 13

Views: 22997

Answers (2)

Rafiq
Rafiq

Reputation: 11465

My problem was find all users who purchase same product, where a user can purchase a product multiple time.

https://mongoplayground.net/p/UTuT4e_N6gn

db.payments.aggregate([
  {
    "$lookup": {
      "from": "user",
      "localField": "user",
      "foreignField": "_id",
      "as": "user_docs"
    }
  },
  {
    "$unwind": "$user_docs",
    
  },
  {
    "$group": {
      "_id": "$user_docs._id",
      "name": {
        "$first": "$user_docs.name"
      },
      
    }
  },
  {
    "$project": {
      "_id": 0,
      "id": "$_id",
      "name": "$name"
    }
  }
])

Upvotes: 0

tengobash
tengobash

Reputation: 367

You could group by _id via $group and then use the $last and $first operator respectively to keep the last or first values.

db.games.aggregate([ 
{ $unwind : "$rounds"},
{ $match: {
 "rounds.round_values.gameStage": "River",
 "rounds.round_values.decision": "BetPlus" }
 },
 { $group: { 
     _id: "$_id", 
     "FinalFundsChange": { $first: "$FinalFundsChange" }, 
     "GameID": { $last: "$GameID" }
   }
 }
])

Upvotes: 13

Related Questions