Shajesh J
Shajesh J

Reputation: 3

Retrieving a count that matches specified criteria in a $group aggregation

So I am looking to group documents in my collection on a specific field, and for the output results of each group, I am looking to include the following:

  1. A count of all documents in the group that match a specific query (i.e. a count of documents that satisfy some expression { "$Property": "Value" })
  2. The total number of documents in the group
  3. (Bonus, as I suspect that this is not easily accomplished) Properties of a document that correspond to a $min/$max accumulator

I am very new to the syntax used to query in mongo and don't quite understand how it all works, but after some research, I've managed to get it down to the following query (please note, I am currently using version 3.0.12 for my mongo db, but I believe we will upgrade in a couple of months time):

db.getCollection('myCollection').aggregate(
    [
        {
            $group: {
                _id: {
                    GroupID: "$GroupID", 
                    Status: "$Status"
                }, 
                total: { $sum: 1 }, 
                GroupName: { $first: "$GroupName" }, 
                EarliestCreatedDate: { $min: "$DateCreated" }, 
                LastModifiedDate: { $max: "$LastModifiedDate" }
            }
        }, 
        {
            $group: {
                _id: "$_id.GroupID", 
                Statuses: {
                    $push: {
                        Status: "$_id.Status", 
                        Count: "$total"
                    }
                }, 
                TotalCount: { $sum: "$total" }, 
                GroupName: { $first: "$GroupName" }, 
                EarliestCreatedDate: { $min: "$EarliestCreatedDate" }, 
                LastModifiedDate: { $max: "$LastModifiedDate" }
            }
        }
    ]
)

Essentially what I am looking to retrieve is the Count for specific Status values, and project them into one final result document that looks like the following:

{
    GroupName, 
    EarliestCreatedDate, 
    EarliestCreatedBy, 
    LastModifiedDate, 
    LastModifiedBy, 
    TotalCount,
    PendingCount, 
    ClosedCount
}

Where PendingCount and ClosedCount are the total number of documents in each group that have a status Pending/Closed. I suspect I need to use $project with some other expression to extract this value, but I don't really understand the aggregation pipeline well enough to figure this out.

Also the EarliestCreatedBy and LastModifiedBy are the users who created/modified the document(s) corresponding to the EarliestCreatedDate and LastModifiedDate respectively. As I mentioned, I think retrieving these values will add another layer of complexity, so if there is no practical solution, I am willing to forgo this requirement.

Any suggestions/tips would be very much appreciated.

Upvotes: 0

Views: 1539

Answers (1)

s7vr
s7vr

Reputation: 75964

You can try below aggregation stages.

$group

Calculate all the necessary counts TotalCount, PendingCount and ClosedCount for each GroupID

Calculate $min and $max for EarliestCreatedDate and LastModifiedDate respectively and push all the fields to CreatedByLastModifiedBy to be compared later for fetching EarliestCreatedBy and LastModifiedBy for each GroupID

$project

Project all the fields for response

$filter the EarliestCreatedDate value against the data in the CreatedByLastModifiedBy and $map the matching CreatedBy to the EarliestCreatedBy and $arrayElemAt to convert the array to object.

Similar steps for calculating LastModifiedBy

db.getCollection('myCollection').aggregate(
    [{
        $group: {
            _id: "$GroupID",
            TotalCount: {
                $sum: 1
            },
            PendingCount: {
                $sum: {
                    $cond: {
                        if: {
                            $eq: ["Status", "Pending"]
                        },
                        then: 1,
                        else: 0
                    }
                }
            },
            ClosedCount: {
                $sum: {
                    $cond: {
                        if: {
                            $eq: ["Status", "Closed "]
                        },
                        then: 1,
                        else: 0
                    }
                }
            },
            GroupName: {
                $first: "$GroupName"
            },
            EarliestCreatedDate: {
                $min: "$DateCreated"
            },
            LastModifiedDate: {
                $max: "$LastModifiedDate"
            },
            CreatedByLastModifiedBy: {
                $push: {
                    CreatedBy: "$CreatedBy",
                    LastModifiedBy: "$LastModifiedBy",
                    DateCreated: "$DateCreated",
                    LastModifiedDate: "$LastModifiedDate"
                }
            }
        }
    }, {
        $project: {
            _id: 0,
            GroupName: 1,
            EarliestCreatedDate: 1,
            EarliestCreatedBy: {
                $arrayElemAt: [{
                    $map: {
                        input: {
                            $filter: {
                                input: "$CreatedByLastModifiedBy",
                                as: "CrBy",
                                cond: {
                                    "$eq": ["$EarliestCreatedDate", "$$CrBy.DateCreated"]
                                }
                            }
                        },
                        as: "EaCrBy",
                        in: {
                            "$$EaCrBy.CreatedBy"
                        }
                    }
                }, 0]
            },
            LastModifiedDate: 1,
            LastModifiedBy: {
                $arrayElemAt: [{
                    $map: {
                        input: {
                            $filter: {
                                input: "$CreatedByLastModifiedBy",
                                as: "MoBy",
                                cond: {
                                    "$eq": ["$LastModifiedDate", "$$MoBy.LastModifiedDate"]
                                }
                            }
                        },
                        as: "LaMoBy",
                        in: {
                            "$$LaMoBy.LastModifiedBy"
                        }
                    }
                }, 0]
            },
            TotalCount: 1,
            PendingCount: 1,
            ClosedCount: 1
        }
    }]
)

Update for Version < 3.2

$filter is also not available in your version. Below is the equivalent.

The comparison logic is the same and creates an array with for every non matching entry the value of false or LastModifiedBy otherwise.

Next step is to use $setDifference to compare the previous array values with array [false] which returns the elements that only exist in the first set.

LastModifiedBy: {
    $setDifference: [{
            $map: {
                input: "$CreatedByLastModifiedBy",
                as: "MoBy",
                in: {
                    $cond: [{
                            $eq: ["$LastModifiedDate", "$$MoBy.LastModifiedDate"]
                        },
                        "$$MoBy.LastModifiedBy",
                        false
                    ]
                }
            }
        },
        [false]
    ]
}

Add $unwind stage after $project stage to change to object

{$unwind:"$LastModifiedBy"}

Similar steps for calculating EarliestCreatedBy

Upvotes: 1

Related Questions