Reputation: 683
I'm having group of elements in MongoDB as given below:
/* 1 */
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"Name" : "Kevin",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2014-08-31"
},
{
"event_type" : "Anniversary",
"event_date" : "2014-08-31"
}
]
}
/* 2 */
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"Name" : "Peter",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2014-08-31"
},
{
"event_type" : "Anniversary",
"event_date" : "2015-03-24"
}
]
}
/* 3 */
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"Name" : "Pole",
"pb_event" : [
{
"event_type" : "Birthday",
"event_date" : "2015-03-24"
},
{
"event_type" : "Work Anniversary",
"event_date" : "2015-03-24"
}
]
}
Now I want the result that has group on event_date
then after group on event_type
. event_type
contain all names of the related user, then count of records in the respective array.
Expected Output
/* 1 */
{
"event_date" : "2014-08-31",
"data" : [
{
"event_type" : "Birthday",
"details" : [
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"name" : "Kevin"
},
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"name" : "Peter"
}
],
"count" : 2
},
{
"event_type" : "Anniversary",
"details" : [
{
"_id" : ObjectId("58736c7f7d43c305461cdb9b"),
"name" : "Kevin"
}
],
"count" : 1
}
]
}
/* 2 */
{
"event_date" : "2015-03-24",
"data" : [
{
"event_type" : "Anniversary",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba8"),
"name" : "Peter"
}
],
"count" : 1
},
{
"event_type" : "Birthday",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"name" : "Pole"
}
],
"count" : 1
},
{
"event_type" : "Work Anniversary",
"details" : [
{
"_id" : ObjectId("58736cfc7d43c305461cdba9"),
"name" : "Pole"
}
],
"count" : 1
}
]
}
Upvotes: 5
Views: 3443
Reputation: 103305
Using the aggregation framework, you would need to run a pipeline that has the following stages so that you get the desired result:
db.collection.aggregate([
{ "$unwind": "$pb_event" },
{
"$group": {
"_id": {
"event_date": "$pb_event.event_date",
"event_type": "$pb_event.event_type"
},
"details": {
"$push": {
"_id": "$_id",
"name": "$Name"
}
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.event_date",
"data": {
"$push": {
"event_type": "$_id.event_type",
"details": "$details",
"count": "$count"
}
}
}
},
{
"$project": {
"_id": 0,
"event_date": "$_id",
"data": 1
}
}
])
In the above pipeline, the first step is the $unwind
operator
{ "$unwind": "$pb_event" }
which comes in quite handy when the data is stored as an array. When the unwind operator is applied on a list data field, it will generate a new record for each and every element of the list data field on which unwind is applied. It basically flattens the data.
This is a necessary operation for the next pipeline stage, the $group
step where you group the flattened documents by the deconstructed pb_event
array fields event_date
and event_type
:
{
"$group": {
"_id": {
"event_date": "$pb_event.event_date",
"event_type": "$pb_event.event_type"
},
"details": {
"$push": {
"_id": "$_id",
"name": "$Name"
}
},
"count": { "$sum": 1 }
}
},
The $group
pipeline operator is similar to the SQL's GROUP BY
clause. In SQL, you can't use GROUP BY
unless you use any of the aggregation functions. The same way, you have to use an aggregation function in MongoDB (called an accumulator operator) as well. You can read more about the aggregation functions here.
In this $group
operation, the logic to calculate the count aggregate i.e. the total number of documents in the group using the $sum
accumulator operator. Within the same pipeline, you can aggregate a list of the name
and _id
subdocuments by using the $push
operator which returns an array of expression values for each group.
The preceding $group
pipeline
{
"$group": {
"_id": "$_id.event_date",
"data": {
"$push": {
"event_type": "$_id.event_type",
"details": "$details",
"count": "$count"
}
}
}
}
will further aggregate the results from the last pipeline by grouping on the event_date
, which forms basis of the desired output by creating a new data list using $push
and then the final $project
pipeline stage
{
"$project": {
"_id": 0,
"event_date": "$_id",
"data": 1
}
}
reshapes the documents fields by renaming the _id
field to event_date
and retaining the other field.
Upvotes: 4