Reputation: 2742
I'm looking to form a query that will pull a set of results from my mongo database, but remove/ignore results that have a duplicate field value.
Here is the senario, i'm pulling many results from the spotify api and storing them in my database, and due to the nature of what I am doing, I end up pulling many of the same albums, these albums share an id
field. Note this is not the mongo _id
field.
What I want, is to eliminate pulling multiple of the same album when the user builds a query that could include these duplicates.
Here is my query currently, which does what i want, but doesn't filter out the duplicates:
Albums.aggregate([
{ $match : { source_region : { $in: countries }}},
{ $skip : offset },
{ $limit : limit }
])
At first i was using the more typical Collection.find().sort()
etc and came across distinct
, but you can't use sort
, limit
etc with distinct
.
I've also tried using $group
but that seems to just return the field i specify, so when i try something like:
{ $group : { _id : null, uniqueValues : { $addToSet : "$id" }}}
the only field that is returned is the id
field, when i need about 10-20 associated with that album.
If anybody could point me in the right direction that would be great!
Update 1
Here is an example of some documents in the collection
{
_id : ObjectId("5ad965a8bc349952904f7f31"),
id : 0nEsaNZGpk0HIgY3OGCyR6,
title : "some album",
artist : "some artist
},
{
_id : ObjectId("665fhFHJFjdjfud7d6f6"),
id : 5JUSBHF&55sdfhjkf86sd,
title : "another album",
artist : "another artist
},
{
_id : ObjectId("56&DFHJFHJJFJSgh76sdghhsd"),
id : 0nEsaNZGpk0HIgY3OGCyR6,
title : "some album",
artist : "some artist
}
So if this was my data, I would want to only return one of the documents that share the spotify generated id
field.
Upvotes: 0
Views: 101
Reputation: 151092
Since you've gone pretty silent, we'll just have to make some presumptions then.
With no other data to go on other than you expect "one" property in your documents to define "unique" ( other than _id
, which already does ) then what you would do is something like this:
Albumns.aggregate([
{ "$group": {
"_id": "$uniqueProp",
"doc": { "$first": "$$ROOT" }
}},
{ "$replaceRoot": { "newRoot": "$doc" } }
{ "$skip": offset },
{ "$limit": limit }
])
Or whatever other manipulation you want to do.
With a $group
pipeline stage, the _id
property is what determines "uniqueness" of results that you "group by". There is never more than 1 of the same value produced by whatever gets specified in this key. You can even have a compound value:
{ "$group": {
"_id": { "firstField": "$firstField", "secondField": "$secondField" },
"doc": { "$first": "$$ROOT" }
}}
So whatever is in there comes out unique.
Whenever you are "grouping" you need an "accumulator" for anything other than the _id
key. So here we use $first
to simply take the first result of any value we specify and use $$ROOT
here for the whole document.
Modern releases have $replaceRoot
to clean up the document. If you don't have that, then you can either $project
every field or simply use the output under the "doc"
property.
Upvotes: 1