MarcoS
MarcoS

Reputation: 17721

mongodb: how to filter one document for each group with the same property value?

I have a mongodb collection structure like this one:

var personSchema = new mongoose.Schema({
  _id: ObjectId,
  name: String,
  // ...
  alias: String
};

(I use mongoose, but this is secondary).

Since I fetch people from different sources, some of the documents can reference the same person: in this case I want to keep both people in database, and I assign them a (unique) alias to both of them.

Currently, when I need to make a query to list persons univocally, I retrieve all people, and then filter out aliases, keeping only one of them (I don't care which one), in javascript (of course I need to keep also persons with no alias). Something like this:

Person.find({}, null, function(err, persons) {
  var result = [];
  var aliases = [];
  for (var i = 0; i < persons.length; i++) {
    if (persons[i].alias && aliases.hasOwnProperty(persons[i].alias))
      continue;  // skip this person because it's alias was seen already
    result.push(persons[i]); // add this person to result
    if (persons[i].alias) // add this person alias to seen aliases
      aliases[persons[i].alias] = true;
  }
});

Since this is quite slow, when people count grows, I'd like to filter out duplicated aliases (and keep just one) in the mongo query, but I can't elaborate a filter which fits...

Any clue?

UPDATE: As requested i comment, I add some sample Person data:

{ "_id" : "1", "name" : "Alice" },
{ "_id" : "2", "name" : "Bob",   "alias" : "afa776bea788cf4c" },
{ "_id" : "3", "name" : "Bobby", "alias" : "afa776bea788cf4c" },
{ "_id" : "4", "name" : "Zoe",   "alias" : "2211293acc82329a" },

From the query I'm looking for, I'd need to get:

{ "_id" : "1", "name" : "Alice" },
{ "_id" : "2", "name" : "Bob",   "alias" : "afa776bea788cf4c" },
{ "_id" : "4", "name" : "Zoe",   "alias" : "2211293acc82329a" },

(getting "Bobby" instead of "Bob" would be fine too).

Of course this data structure is not mandatory, I'd accept a change suggestion, of course...

Upvotes: 0

Views: 1221

Answers (3)

Volodymyr Synytskyi
Volodymyr Synytskyi

Reputation: 4055

You can do it using mongo aggregation.

As far as I understand there are documents without alias field. If it is incorrect you don't need first project operator.

Person.aggregate([
    {   $project: {
            alias: {$ifNull: ['$alias', "$_id"] },
            name: 1
        }
    },
    {   $group: { _id: "$alias", name: {$first: "$name"}}},
    {
        $project: {_id:0, name: 1}
    }
], callback);

Upvotes: 0

Isaiah4110
Isaiah4110

Reputation: 10100

Using aggregation you can use the following $GROUPquery, to get the desired list:

db.collection.aggregate([ {$group:{"_id":"$alias", "name":{$first:"$name"}, "id":{$first:"$_id"}}}, {$project:{"id":1,"_id":0,"alias":"$_id","name":1}} ]);

Upvotes: 1

Quy
Quy

Reputation: 1373

Try the Model.distinct operation.

http://mongoosejs.com/docs/api.html#query_Query-distinct

Person.distinct('alias', callback);

This should return a list of documents that have distinct values for the alias.

Upvotes: 0

Related Questions