Cybersupernova
Cybersupernova

Reputation: 1839

Mongo Distinct Query with full row object

first of all i'm new to mongo so I don't know much and i cannot just remove duplicate rows due to some dependencies.

I have following data stored in mongo

{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 2, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'},
{'id': 5, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}

you can see some of the rows are duplicate with different id as long as it will take to solve this issue from input I must tackle it on output.

I need the data in the following way:

{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}

My query

keys = db.collection.distinct('key', {})
all_data = db.collection.find({'key': {$in: keys}})

As you can see it takes two queries for a same result set Please combine it to one as the database is very large

I might also create a unique key on the key but the value is so long (152 characters) that it will not help me.

Or it will??

Upvotes: 4

Views: 955

Answers (1)

Alex
Alex

Reputation: 21766

You need to use the aggregation framework for this. There are multiple ways to do this, the solution below uses the $$ROOT variable to get the first document for each group:

db.data.aggregate([{
  "$sort": {
    "_id": 1
  }
}, {
  "$group": {
    "_id": "$key",
    "first": {
      "$first": "$$ROOT"
    }
  }
}, {
  "$project": {
    "_id": 0,
    "id":"$first.id",
    "key":"$first.key",
    "name":"$first.name",
    "country":"$first.country"
  }
}])

Upvotes: 5

Related Questions