5a01d01P
5a01d01P

Reputation: 683

Multiple Nested Group Within Array

I'm having group of elements in MongoDB as given below:

/* 1 */
{
    "_id" : ObjectId("58736c7f7d43c305461cdb9b"),
    "Name" : "Kevin",
    "pb_event" : [ 
        {
            "event_type" : "Birthday",
            "event_date" : "2014-08-31"
        }, 
        {
            "event_type" : "Anniversary",
            "event_date" : "2014-08-31"
        }
    ]
}

/* 2 */
{
    "_id" : ObjectId("58736cfc7d43c305461cdba8"),
    "Name" : "Peter",
    "pb_event" : [ 
        {
            "event_type" : "Birthday",
            "event_date" : "2014-08-31"
        }, 
        {
            "event_type" : "Anniversary",
            "event_date" : "2015-03-24"
        }
    ]
}

/* 3 */
{
    "_id" : ObjectId("58736cfc7d43c305461cdba9"),
    "Name" : "Pole",
    "pb_event" : [ 
        {
            "event_type" : "Birthday",
            "event_date" : "2015-03-24"
        }, 
        {
            "event_type" : "Work Anniversary",
            "event_date" : "2015-03-24"
        }
    ]
}

Now I want the result that has group on event_date then after group on event_type. event_type contain all names of the related user, then count of records in the respective array.

Expected Output

/* 1 */
{    
    "event_date" : "2014-08-31",
    "data" : [ 
        {
            "event_type" : "Birthday",
            "details" : [ 
                {
                    "_id" : ObjectId("58736c7f7d43c305461cdb9b"),
                    "name" : "Kevin"
                }, 
                {
                    "_id" : ObjectId("58736cfc7d43c305461cdba8"),
                    "name" : "Peter"
                }
            ],
            "count" : 2
        }, 
        {
            "event_type" : "Anniversary",
            "details" : [ 
                {
                    "_id" : ObjectId("58736c7f7d43c305461cdb9b"),
                    "name" : "Kevin"
                }
            ],
            "count" : 1
        }
    ]
}

/* 2 */
{
    "event_date" : "2015-03-24",
    "data" : [ 
        {
            "event_type" : "Anniversary",
            "details" : [ 
                {
                    "_id" : ObjectId("58736cfc7d43c305461cdba8"),
                    "name" : "Peter"
                }
            ],
            "count" : 1
        }, 
        {
            "event_type" : "Birthday",
            "details" : [ 
                {
                    "_id" : ObjectId("58736cfc7d43c305461cdba9"),
                    "name" : "Pole"
                }
            ],
            "count" : 1
        }, 
        {
            "event_type" : "Work Anniversary",
            "details" : [ 
                {
                    "_id" : ObjectId("58736cfc7d43c305461cdba9"),
                    "name" : "Pole"
                }
            ],
            "count" : 1
        }
    ]
}

Upvotes: 5

Views: 3443

Answers (1)

chridam
chridam

Reputation: 103305

Using the aggregation framework, you would need to run a pipeline that has the following stages so that you get the desired result:

db.collection.aggregate([
    { "$unwind": "$pb_event" },
    {
        "$group": {
            "_id": {
                "event_date": "$pb_event.event_date",
                "event_type": "$pb_event.event_type" 
            },            
            "details": {
                "$push": {
                    "_id": "$_id",
                    "name": "$Name"
                }
            },
            "count": { "$sum": 1 }            
        }
    },    
    {
        "$group": {
            "_id": "$_id.event_date",            
            "data": {
                "$push": {
                    "event_type": "$_id.event_type",
                    "details": "$details",
                    "count": "$count"
                }
            }           
        }
    },
    {
        "$project": {
            "_id": 0,
            "event_date": "$_id",
            "data": 1
        }
    }
])

In the above pipeline, the first step is the $unwind operator

{ "$unwind": "$pb_event" }

which comes in quite handy when the data is stored as an array. When the unwind operator is applied on a list data field, it will generate a new record for each and every element of the list data field on which unwind is applied. It basically flattens the data.

This is a necessary operation for the next pipeline stage, the $group step where you group the flattened documents by the deconstructed pb_event array fields event_date and event_type:

{
    "$group": {
        "_id": {
            "event_date": "$pb_event.event_date",
            "event_type": "$pb_event.event_type" 
        },            
        "details": {
            "$push": {
                "_id": "$_id",
                "name": "$Name"
            }
        },
        "count": { "$sum": 1 }            
    }
},

The $group pipeline operator is similar to the SQL's GROUP BY clause. In SQL, you can't use GROUP BY unless you use any of the aggregation functions. The same way, you have to use an aggregation function in MongoDB (called an accumulator operator) as well. You can read more about the aggregation functions here.

In this $group operation, the logic to calculate the count aggregate i.e. the total number of documents in the group using the $sum accumulator operator. Within the same pipeline, you can aggregate a list of the name and _id subdocuments by using the $push operator which returns an array of expression values for each group.

The preceding $group pipeline

{
    "$group": {
        "_id": "$_id.event_date",            
        "data": {
            "$push": {
                "event_type": "$_id.event_type",
                "details": "$details",
                "count": "$count"
            }
        }           
    }
}

will further aggregate the results from the last pipeline by grouping on the event_date, which forms basis of the desired output by creating a new data list using $push and then the final $project pipeline stage

{
    "$project": {
        "_id": 0,
        "event_date": "$_id",
        "data": 1
    }
}

reshapes the documents fields by renaming the _id field to event_date and retaining the other field.

Upvotes: 4

Related Questions