Reputation: 2813
I have a set of (~35k) documents which look like this:
{
"_id" : ObjectId("583dabfc7572394f93ac6ef2"),
"updatedAt" : ISODate("2016-11-29T16:25:32.130Z"),
"createdAt" : ISODate("2016-11-29T16:25:32.130Z"),
"sourceType" : "report",
"sourceRef" : ObjectId("583da865686e3dfbd977f059"),
"type" : "video",
"caption" : "lorem ipsum",
"timestamps" : {
"postedAt" : ISODate("2016-08-26T15:09:35.000Z"),
"monthOfYear" : 7, // 0-based
"dayOfWeek" : 5, // 0-based
"hourOfDay" : 16 // 0-based
},
"stats" : {
"comments" : 0,
"likes" : 8
},
"user" : {
"id" : "123456",
"username" : "johndoe",
"fullname" : "John",
"picture" : ""
},
"images" : {
"thumbnail" : "",
"low" : "",
"standard" : ""
},
"mentions" : [
"janedoe"
],
"tags" : [
"holiday",
"party"
],
"__v" : 0
}
I want to produce an aggregate report, which will be used to graph frequency of documents by hour of day/day of week/month of year, along with counts for mentions/tags.
{
// Each frequency is independant from the others,
// e.g. the total count for each frequency should
// be ~35k.
dayFrequency: [
{ day: 0, count: 1400 }, // Monday
{ day: 1, count: 1700 }, // Tuesday
{ day: 2, count: 1800 }, // Wednesday
{ /* etc */ },
{ day: 6, count: 1200 } // Sunday
],
monthFrequency: [
{ month: 0, count: 200 }, // January
{ month: 1, count: 250 }, // February
{ month: 2, count: 300 }, // March
{ /* etc */ },
{ month: 11, count: 150 } // December
],
hourFrequency: [
{ hour: 0, count: 150 }, // 0am
{ hour: 1, count: 200 }, // 1am
{ hour: 2, count: 275 }, // 2am
{ /* etc */ },
{ hour: 23, count: 150 }, // 11pm
],
mentions: {
janedoe: 12,
johnsmith: 11,
peter: 54,
/* and so on */
},
tags: {
holiday: 872,
party: 1029,
/* and so on */
}
}
Is this possible, and if so, how would I write it? From what I understand, as I am performing an aggregate of all matching documents, it would effectively be one group?
My code so far simply groups all matching records into one group, but I'm not sure on how to move forward.
Model.aggregate([
{ $match: { sourceType: 'report', sourceRef: '583da865686e3dfbd977f059' } },
{ $group: {
_id: '$sourceRef'
}}
], (err, res) => {
console.log(err);
console.log(res);
})
Also acceptable would be counting frequencies as an array of counts (e.g. [ 1400, 1700, 1800, /* etc */ 1200 ]
), which lead me to look at $count
and a few of the other operators, however again I'm not clear on usage.
Upvotes: 1
Views: 486
Reputation: 103475
Currently not possible (at the time of writing) to do this with MongoDB 3.2 in a single pipeline. However, from MongoDB 3.4 and onwards, you can use the $facet
operator which allows multiple aggregation pipelines to be processed within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its results are stored as an array of documents.
For example, the above can be implemented by running the following aggregation pipeline:
Model.aggregate([
{ "$match": { "sourceType": "report", "sourceRef": "583da865686e3dfbd977f059" } },
{
"$facet": {
"dayFrequency": [
{
"$group": {
"_id": "$timestamps.dayOfWeek",
"count": { "$sum": 1 }
}
}
],
"monthFrequency": [
{
"$group": {
"_id": "$timestamps.monthOfYear",
"count": { "$sum": 1 }
}
}
],
"hourFrequency": [
{
"$group": {
"_id": "$timestamps.hourOfDay",
"count": { "$sum": 1 }
}
}
],
"mentions": [
{ "$unwind": "$mentions" },
{
"$group": {
"_id": "$mentions",
"count": { "$sum": 1 }
}
}
],
"tags": [
{ "$unwind": "$tags" },
{
"$group": {
"_id": "$tags",
"count": { "$sum": 1 }
}
}
]
}
}
], (err, res) => {
console.log(err);
console.log(res);
})
Upvotes: 1