Reputation: 306
I have to work with MongoDB for my job, but I'm not very comfortable with it. I have to gather some documents and remove duplicates according to one field.
Here is a (very very) simplified structure of a document :
{
'user': 'The User',
'report': {
'id' : 0
...
}
}
A user can have several reports, and several identical reports (not a conception mistake, only the simplified structure makes it strange).
A report is only related to one user.
I would like to retrieve a set of Reports-User by removing all duplicates reports id. Here is an example :
# Datas
User | Report ID
--------|----------
User1 | AAAA
User1 | AAAA
User1 | BBBB
User2 | CCCC
User3 | DDDD
User3 | DDDD
# Excepted output where each line represents a document
User | Report ID
--------|----------
User1 | AAAA
User1 | BBBB
User2 | CCCC
User3 | DDDD
I am really confused with all the aggregators. How can I do this?
Upvotes: 0
Views: 204
Reputation: 1867
This is pretty straight forward using the $group
operator in aggregation pipeline.
First, my sample data:
[
{ 'user': 'User1', report: { id: 'AAAA' } },
{ 'user': 'User1', report: { id: 'BBBB' } },
{ 'user': 'User1', report: { id: 'AAAA' } },
{ 'user': 'User2', report: { id: 'CCCC' } },
{ 'user': 'User3', report: { id: 'DDDD' } },
{ 'user': 'User3', report: { id: 'DDDD' } }
]
To get the same Expected format you posted, you can execute the following query:
db.reports.aggregate([
{
$group: {
_id: "$report.id",
user: {
$first: '$user'
}
}
},
{
$project: {
_id: 0,
User: '$user',
Report: '$_id'
}
}
])
The first step in this aggregation pipeline groups all of the items in your collection by report.id
. Notice the dot notation to reference a field the embedded document. It also projects the user
field by selecting the value of the user
field on the first document mongo finds with that report ID. You mention that report IDs are unique to users, so this shouldn't cause any problems.
The second step in this aggregation pipeline just renames the fields to the names you used for your expected format. The $group
operator sets the _id
field of the output to the field you grouped by (in this case, report.id
). The $project
command uses that value to set the Report
field and unsets the _id
.
Upvotes: 1