Reputation: 1266
I am fairly new to the aggregate framework within MongoDB, but to my understanding the $addToSet
functionality ONLY adds unique values to the array and ignores existing values. So for some reason the below aggregate still produces duplicates
db.tweets.aggregate([
{
$group: {
_id: "$_id",
hashtags: {
$addToSet : "$tweet.entities.hashtags.text"
}
}
},
{ $unwind : "$hashtags" }
]);
Original Hashtags Array:
"hashtags" : [
{
"indices" : [
64,
73
],
"text" : "TONYTour"
},
{
"indices" : [
97,
101
],
"text" : "NIU"
},
{
"indices" : [
102,
106
],
"text" : "NIU"
},
{
"indices" : [
107,
111
],
"text" : "NIU"
}
]
},
result:
{
"_id" : ObjectId("53f4aad7485aee023d000115"),
"hashtags" : [
"TONYTour",
"NIU",
"NIU",
"NIU"
]
}
I attempted to do a second group after the unwind but had no success. So what exactly am I not grasping from the aggregate framework, in order to achieve the results I'm looking for which would be:
{
"_id" : ObjectId("53f4aad7485aee023d000115"),
"hashtags" : [
"TONYTour",
"NIU"
]
}
Upvotes: 1
Views: 2290
Reputation: 3198
My guess would be that your problem is this section of the documentation
$addToSet only ensures that there are no duplicate items added to the set and does not affect existing duplicate elements. $addToSet does not guarantee a particular ordering of elements in the modified set.
So your problem is, that the duplicate hastags are in the same document. You could solve that by using unwind first:
db.tweets.aggregate([
{
{ $unwind : "$tweet.entities.hashtags" },
$group: {
_id: "$_id",
hashtags: {
$addToSet : "$tweet.entities.hashtags.text"
}
}
}
]);
This will create one document per hashtag and then $addToSet
should not add duplicate items
Edit: Correction by Neil Lunn
Upvotes: 3