MetaCoder
MetaCoder

Reputation: 488

Mongo aggregation: Return count of distinct values from Key Value object

I'm really hoping someone can help me here, I'm going nuts with this problem :-d

So I have multiple documents (+100,000) which looks like this:

 "_id" : ObjectId("60e5ae42fcc92f14c3a41208"),
    "userId" : "xxxx",
    "projectCreator" : {
        "userId" : "xxx|xxxx"
    },
    "hashTags" : [
        "Spring",
        "Java"
    ],
    "projectCategories" : {
        "60d76ef0597444095b8ab4b2" : "Backend",
        "60d76ef0597444095b8ab232" : "Infrastructure" 
    },
    "createdDate" : ISODate("2021-07-07T13:38:10.655Z"),
    "updatedAt" : ISODate("2021-07-08T11:48:36.200Z"),

    "_class" : "xxxx.model.project.Project"
}

I would like to have a query which does the following:

  1. Extract all the unique projectCategories value (the string value not the id) from all documents in a collection
  2. Count the number of occurrences of each of the values

So the result would look like this:

Backend : NUMBER OF OCCURRENCES
FrontEnd : NUMBER OF OCCURRENCES
Infrastructure: NUMBER OF OCCURRENCES

I "think" I need to do a aggregation and the group the values and then do a count but to be very honest I cant understand this.

I've tried this query:

db.projects.aggregate([ { $match: { isDeleted : {$ne: true} }},{ $match: { projectCategories: { $exists:true, $ne: null }} },{ $project: { result: { $objectToArray: "$projectCategories" } } },{ $unwind : "$result"}])

which will return this:

{ "_id" : ObjectId("60c313e2905d344c7dd117f1"), "result" : { "k" : "60d76f295974444b818ab4bc", "v" : "Apps" } }
{ "_id" : ObjectId("60c313e2905d344c7dd117f1"), "result" : { "k" : "60d76f1759744461468ab4b8", "v" : "Development Tools" } }
{ "_id" : ObjectId("60c313e2905d344c7dd117f1"), "result" : { "k" : "60d76eeb597444b9da8ab4b1", "v" : "Frontend" } }
{ "_id" : ObjectId("60cfb59f30b2647610a6c931"), "result" : { "k" : "60d76eeb597444b9da8ab4b1", "v" : "Frontend" } }
{ "_id" : ObjectId("60cfb59f30b2647610a6c931"), "result" : { "k" : "60d76ef659744422d68ab4b3", "v" : "Fullstack" } }
{ "_id" : ObjectId("60cfb69730b2647610a6c932"), "result" : { "k" : "60d76f295974444b818ab4bc", "v" : "Apps" } }
{ "_id" : ObjectId("60df83e84d8b6341d49cff4e"), "result" : { "k" : "60d76ef0597444095b8ab4b2", "v" : "Backend" } }
{ "_id" : ObjectId("60df83e84d8b6341d49cff4e"), "result" : { "k" : "60d76eeb597444b9da8ab4b1", "v" : "Frontend" } }
{ "_id" : ObjectId("60df83e84d8b6341d49cff4e"), "result" : { "k" : "60d76ef659744422d68ab4b3", "v" : "Fullstack" } }
{ "_id" : ObjectId("60e5ae42fcc92f14c3a41208"), "result" : { "k" : "60d76ef0597444095b8ab4b2", "v" : "Backend" } }
{ "_id" : ObjectId("60f0abf9f5c82b27af712ad7"), "result" : { "k" : "60d76f2559744477168ab4bb", "v" : "Games" } }
{ "_id" : ObjectId("60f0abf9f5c82b27af712ad7"), "result" : { "k" : "60d76ef659744422d68ab4b3", "v" : "Fullstack" } }
{ "_id" : ObjectId("60f68d2df9710f58c1e9c872"), "result" : { "k" : "60d76f295974444b818ab4bc", "v" : "Apps" } }
{ "_id" : ObjectId("60f68d2df9710f58c1e9c872"), "result" : { "k" : "60d76f0e5974448f038ab4b7", "v" : "Open Source" } }
{ "_id" : ObjectId("60f68d2df9710f58c1e9c872"), "result" : { "k" : "60d76eeb597444b9da8ab4b1", "v" : "Frontend" } }

So where I'm stuck now is how do I unwind and get the output to be like:

Backend : NUMBER OF OCCURRENCES
FrontEnd : NUMBER OF OCCURRENCES
Infrastructure: NUMBER OF OCCURRENCES

Can anyone help me here please?

Thanks!

Update: I've managed to get CLOSE with this query:

db.projects.aggregate([ { $match: { isDeleted : {$ne: true} }},{ $match: { projectCategories: { $exists:true, $ne: null }} },{ $project: { result: { $objectToArray: "$projectCategories" } } },{ $unwind : "$result"}, { $group: { _id: "$result.v", count: { $sum: 1 } } }  ] )

However the output is like this now:

{ "_id" : "Development Tools", "count" : 1 }
{ "_id" : "Games", "count" : 1 }
{ "_id" : "Fullstack", "count" : 3 }
{ "_id" : "Open Source", "count" : 1 }
{ "_id" : "Frontend", "count" : 4 }
{ "_id" : "Apps", "count" : 3 }
{ "_id" : "Backend", "count" : 2 }

Is it possible to drop the _id?

Upvotes: 2

Views: 733

Answers (1)

Joe
Joe

Reputation: 28376

You can group again to push the keys and values into an array, and then use $replaceRoot with $arrayToObject:

{$group:{
   _id:null,
   results:{$push: {k:"$_id", v:"$count"}}
}},
{$replaceRoot:{ newRoot: {$arrayToObject:"$results"}}}

Upvotes: 2

Related Questions