rajh2504
rajh2504

Reputation: 1266

Duplicates in aggregation with $addToSet

I am fairly new to the aggregate framework within MongoDB, but to my understanding the $addToSet functionality ONLY adds unique values to the array and ignores existing values. So for some reason the below aggregate still produces duplicates

db.tweets.aggregate([
{ 
    $group: { 
        _id: "$_id",
        hashtags: { 
            $addToSet : "$tweet.entities.hashtags.text" 
        }
    }
},
{ $unwind : "$hashtags" }
]);

Original Hashtags Array:

"hashtags" : [
                {
                    "indices" : [
                        64,
                        73
                    ],
                    "text" : "TONYTour"
                },
                {
                    "indices" : [
                        97,
                        101
                    ],
                    "text" : "NIU"
                },
                {
                    "indices" : [
                        102,
                        106
                    ],
                    "text" : "NIU"
                },
                {
                    "indices" : [
                        107,
                        111
                    ],
                    "text" : "NIU"
                }
            ]
        },

result:

{
        "_id" : ObjectId("53f4aad7485aee023d000115"),
        "hashtags" : [
            "TONYTour",
            "NIU",
            "NIU",
            "NIU"
        ]
    }

I attempted to do a second group after the unwind but had no success. So what exactly am I not grasping from the aggregate framework, in order to achieve the results I'm looking for which would be:

{
        "_id" : ObjectId("53f4aad7485aee023d000115"),
        "hashtags" : [
            "TONYTour",
            "NIU"
        ]
    }

Upvotes: 1

Views: 2290

Answers (1)

Trudbert
Trudbert

Reputation: 3198

My guess would be that your problem is this section of the documentation

$addToSet only ensures that there are no duplicate items added to the set and does not affect existing duplicate elements. $addToSet does not guarantee a particular ordering of elements in the modified set.

So your problem is, that the duplicate hastags are in the same document. You could solve that by using unwind first:

db.tweets.aggregate([
{ 

   { $unwind : "$tweet.entities.hashtags" },
    $group: { 
        _id: "$_id",
        hashtags: { 
            $addToSet : "$tweet.entities.hashtags.text" 
        }
    }
}
]);

This will create one document per hashtag and then $addToSet should not add duplicate items

Edit: Correction by Neil Lunn

Upvotes: 3

Related Questions