swaglord mcmuffin'
swaglord mcmuffin'

Reputation: 236

Double aggregation with distinct count in MongoDB

We have a collection which stores log documents.

Is it possible to have multiple aggregations on different attributes?

A document looks like this in it's purest form:

{
   _id : int,
   agent : string,
   username: string,
   date : string,
   type : int,
   subType: int
}

With the following query I can easily count all documents and group them by subtype for a specific type during a specific time period:

db.logs.aggregate([
    {
        $match: {
            $and : [
                {"date" : { $gte : new ISODate("2020-11-27T00:00:00.000Z")}}
                ,{"date" : { $lte : new ISODate("2020-11-27T23:59:59.000Z")}}
                ,{"type" : 906}
            ]
        }
    },
    {
        $group: {
            "_id" : '$subType',
            count: { "$sum": 1 }
        }
    }
])

My output so far is perfect:

{
   _id: 4,
   count: 5
}

However, what I want to do is to add another counter, which will also add the distinct count as a third attribute.

Let's say I want to append the resultset above with a third attribute as a distinct count of each username, so my resultset would contain the subType as _id, a count for the total amount of documents and a second counter that represents the amount of usernames that has entries. In my case, the number of people that somehow have created documents.

A "pseudo resultset" would look like:

{
   _id: 4,
   countOfDocumentsOfSubstype4: 5
   distinctCountOfUsernamesInDocumentsWithSubtype4: ?
}

Does this makes any sense?

Please help me improve the question as well, since it's difficult to google it when you're not a MongoDB expert.

Upvotes: 0

Views: 380

Answers (1)

ray
ray

Reputation: 15287

You can first group at the finest level, then perform a second grouping to achieve what you need:

db.logs.aggregate([
    {
        $match: {
            $and : [
                {"date" : { $gte : new ISODate("2020-11-27T00:00:00.000Z")}}
                ,{"date" : { $lte : new ISODate("2020-11-27T23:59:59.000Z")}}
                ,{"type" : 906}
            ]
        }
    },
    {
        $group: {
            "_id" : {
                subType : "$subType",
                username : "$username"
            },
            count: { "$sum": 1 }
        }
    },
    {
        $group: {
            "_id" : "$_id.subType",
            "countOfDocumentsOfSubstype4" : {$sum : "$count"},
            "distinctCountOfUsernamesInDocumentsWithSubtype4" : {$sum : 1}
        }
    }
])

Here is the test cases I used: test cases

And here is the aggregate result: aggregate result

Upvotes: 1

Related Questions