ScientiaEtVeritas
ScientiaEtVeritas

Reputation: 5278

Get count and percentage: match of group

Let's say we have a collection like this with user ids and process_name:

--------------------------------------------------
| user    |             process_name             |
--------------------------------------------------
| 12      |                  a                   |
--------------------------------------------------
| 12      |                  b                   |
--------------------------------------------------
| 34      |                  c                   |
--------------------------------------------------
| 36      |                  a                   |
--------------------------------------------------
| 36      |                  d                   |
--------------------------------------------------
| 12      |                  p                   |
--------------------------------------------------

and we would like to get the number of specific processes (here: a and b) used by users, and their percentage in regard to the total process applied by the user.

For user 12 this would be 2 and (2/3 = 0,66) while for user 36 it would be 1 with 1/2 = 0,5.

So, for db.processes.aggregate, I guess we would definetly need a group for users:

$group : {
    _id : "$user",
    total : { $sum : 1 }
}

and a match for the process_names:

$match : {
    "process_name" : {
        $in: ["a", "b"]
    }
}

However, it feels like a chicken-and-egg problem, because after matching I can't retrieve the number of total processes per users anymore. In contrast, after grouping I can't access process_names anymore.

Upvotes: 0

Views: 159

Answers (2)

Gibbs
Gibbs

Reputation: 22974

You can do it with aggregtion

playground

db.collection.aggregate([
  {
    "$group": {//Group by user, count the Total unique processes
      "_id": "$user",
      "processes": {
        "$addToSet": "$process"
      },
      "user": {
        $first: "$user"
      }
    }
  },
  {
    $project: {//Count the matched processes 
      matchCount: {
        $size: {
          "$setIntersection": [
            [
              "a",
              "b"
            ],
            "$processes"
          ]
        }
      },
      "totalCount": {
        $size: "$processes"
      },
      "user": 1
    }
  },
  {
    $project: {//Percentage calculation
      "user": 1,
      "percentage": {
        "$divide": [
          "$matchCount",
          "$totalCount"
        ]
      }
    }
  }
])

//You can definitely use $round to reduce to the required decimals.

Upvotes: 1

varman
varman

Reputation: 8894

You can do like following,

  1. $group to get total of user, put the process_name to array
  2. get the $size of array which includes a or b
  3. get the percentage using $divide

And mongo query

[
  {
    $group: {
      _id: "$user",
      users: {
        $addToSet: "$process_name"
      },
      total: {
        $sum: 1
      }
    }
  },
  {
    $addFields: {
      size_ab: {
        $size: {
          $filter: {
            input: "$users",
            cond: {
              $in: [
                "$$this",
                [
                  "a",
                  "b"
                ]
              ]
            }
          }
        }
      }
    }
  },
  {
    $project: {
      _id: 0,
      user: "$_id",
      percentage: {
        $divide: [
          "$size_ab",
          "$total"
        ]
      }
    }
  }
]

Working Mongo playground

Upvotes: 1

Related Questions