Mongoose unique results

Question

I'm looking to form a query that will pull a set of results from my mongo database, but remove/ignore results that have a duplicate field value.

Here is the senario, i'm pulling many results from the spotify api and storing them in my database, and due to the nature of what I am doing, I end up pulling many of the same albums, these albums share an id field. Note this is not the mongo _id field.

What I want, is to eliminate pulling multiple of the same album when the user builds a query that could include these duplicates.

Here is my query currently, which does what i want, but doesn't filter out the duplicates:

Albums.aggregate([
    { $match : { source_region : { $in: countries }}},
    { $skip  : offset },
    { $limit : limit }
])

At first i was using the more typical Collection.find().sort() etc and came across distinct, but you can't use sort, limit etc with distinct.

I've also tried using $group but that seems to just return the field i specify, so when i try something like:

{ $group : { _id : null, uniqueValues : { $addToSet : "$id" }}}

the only field that is returned is the id field, when i need about 10-20 associated with that album.

If anybody could point me in the right direction that would be great!

Update 1

Here is an example of some documents in the collection

{
  _id : ObjectId("5ad965a8bc349952904f7f31"),
  id : 0nEsaNZGpk0HIgY3OGCyR6,
  title : "some album",
  artist : "some artist
},
{
  _id : ObjectId("665fhFHJFjdjfud7d6f6"),
  id : 5JUSBHF&55sdfhjkf86sd,
  title : "another album",
  artist : "another artist
},
{
  _id : ObjectId("56&DFHJFHJJFJSgh76sdghhsd"),
  id : 0nEsaNZGpk0HIgY3OGCyR6,
  title : "some album",
  artist : "some artist
}

So if this was my data, I would want to only return one of the documents that share the spotify generated id field.

Neil Lunn · Accepted Answer

Since you've gone pretty silent, we'll just have to make some presumptions then.

With no other data to go on other than you expect "one" property in your documents to define "unique" ( other than _id, which already does ) then what you would do is something like this:

Albumns.aggregate([
  { "$group": {
    "_id": "$uniqueProp",
    "doc": { "$first": "$$ROOT" }
  }},
  { "$replaceRoot": { "newRoot": "$doc" } }
  { "$skip": offset },
  { "$limit": limit }
])

Or whatever other manipulation you want to do.

With a $group pipeline stage, the _id property is what determines "uniqueness" of results that you "group by". There is never more than 1 of the same value produced by whatever gets specified in this key. You can even have a compound value:

  { "$group": {
    "_id": { "firstField": "$firstField", "secondField": "$secondField" },
    "doc": { "$first": "$$ROOT" }
  }}

So whatever is in there comes out unique.

Whenever you are "grouping" you need an "accumulator" for anything other than the _id key. So here we use $first to simply take the first result of any value we specify and use $$ROOT here for the whole document.

Modern releases have $replaceRoot to clean up the document. If you don't have that, then you can either $project every field or simply use the output under the "doc" property.

Mongoose unique results

Answers (1)

Related Questions