Srinivas
Srinivas

Reputation: 1556

How to use nested grouping in MongoDB

I need to find total count of duplicate profiles per organization level. I have documents as shown below:

{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "75"
    }
    "_id" : "1"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "75"
    }
    "_id" : "2"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "77"
    }
    "_id" : "3"
},
{
    "OrganizationId" : 10,
    "Profile" : {
        "_id" : "77"
    }
    "_id" : "4"
}

I have written query which is a group by ProfileId and OrganizationId. The results i am getting as shown below:

Organization    Total
10               2
10               2

But i want to get the sum of total per organization level, that means Org 10 should have one row with sum of 4.

The query i am using as shown below:

 db.getSiblingDB("dbName").OrgProfile.aggregate(
 { $project: { _id: 1, P: "$Profile._id",  O: "$OrganizationId" } },
 { $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} },
 { $match: { c: { $gt: 1 } } });

Any ideas ? Please help

Upvotes: 1

Views: 846

Answers (2)

DAXaholic
DAXaholic

Reputation: 35408

The following pipeline should give you the desired output, whereas the last $project stage is just for cosmetic purposes to turn _id into OrganizationId but is not needed for the essential computation so you may omit it.

db.getCollection('yourCollection').aggregate([
    { 
        $group: {  
            _id: { org: "$OrganizationId", profile: "$Profile._id" },
            count: { $sum: 1 }
        }
    },
    {
        $group: {
            _id: "$_id.org",
            Total: { 
                $sum: { 
                    $cond: { 
                        if: { $gte: ["$count", 2] }, 
                        then: "$count", 
                        else: 0
                    }
                }
            }
        } 
     },
     {
         $project: {
             _id: 0,
             Organization: "$_id",
             Total: 1
         }
     }
])

gives this output

{
    "Total" : 4.0,
    "Organization" : 10
}

To filter out organizations without duplicates you can use $match which will also result in a simplification of the second $group stage

...aggregate([
    { 
        $group: {  
            _id: { org: "$OrganizationId", profile: "$Profile._id" },
            count: { $sum: 1 }
        }
    },
    {
        $match: {
            count: { $gte: 2 } 
        }
    },
    {
        $group: {
            _id: "$_id.org",
            Total: { $sum: "$count" }
        } 
     },
     {
         $project: {
             _id: 0,
             Organization: "$_id",
             Total: 1
         }
     }
])

Upvotes: 3

Hayden Braxton
Hayden Braxton

Reputation: 1161

I think I have a solution for you. In that last step there, instead of matching, I think you want another $group.

    .aggregate([

     { $project: { _id: 1, P: "$Profile._id",  O: "$OrganizationId" } }
     ,{ $group: {_id: { p: "$P", o: "$O"}, c: { $sum: 1 }} }
     ,{ $group: { _id: "$_id.o" , c: {  $sum: "$c" } }}

     ]);

You can probably read it and figure out yourself what's happening in that last step, but just in case I'll explain. the last step is group all documents that have the same organization id, and then summing the quantity specified by the previous c field. After the first group, you had two documents that both had a count c of 2 but different profile id. The next group ignores the profile id and just groups them if they have the same organization id and adds their counts.

When I ran this query, here is my result, which is what I think you're looking for:

{
    "_id" : 10,
    "c" : 4
}

Hope this helps. Let me know if you have any questions.

Upvotes: 0

Related Questions