Reputation: 3612
Note : I have provided only a few documents in the output to keep the post small but intuitive
The source collection :
{
"_id" : {
"SpId" : 840,
"Scheduler_Id" : 1,
"Channel_Id" : 2,
"TweetId" : 15
},
"PostDate" : ISODate("2013-10-31T18:30:00Z")
}
{
"_id" : {
"SpId" : 840,
"Scheduler_Id" : 1,
"Channel_Id" : 2,
"TweetId" : 16
},
"PostDate" : ISODate("2013-10-31T18:30:00Z")
}
{
"_id" : {
"SpId" : 840,
"Scheduler_Id" : 1,
"Channel_Id" : 2,
"TweetId" : 17
},
"PostDate" : ISODate("2013-10-30T18:30:00Z")
}
Step-1 : Grouping by PostDate
Query :
db.Twitter_Processed.aggregate({$match : { "_id.SpId" : 840, "_id.Scheduler_Id" : 1 }},{$project:{SpId : "$_id.SpId",Scheduler_Id : "$_id.Scheduler_Id",day:{$dayOfMonth:'$PostDate'},month:{$month:'$PostDate'},year:{$year:'$PostDate'}, senti : "$Sentiment"}}, {$group : {_id : {SpId : "$SpId", Scheduler_Id : "$Scheduler_Id",day:'$day',month:'$month',year:'$year'}, sentiment : { $sum : "$senti"}}}, {$group : {_id : "$_id" , avgSentiment : {$avg : "$sentiment"}}})
Output :
{
"result" : [
{
"_id" : {
"SpId" : 840,
"Scheduler_Id" : 1,
"day" : 31,
"month" : 10,
"year" : 2013
},
"avgSentiment" : 2.2700000000000005
},
{
"_id" : {
"SpId" : 840,
"Scheduler_Id" : 1,
"day" : 30,
"month" : 10,
"year" : 2013
},
"avgSentiment" : 4.96
}
}
Step-2 : Attempting to achieve this :
{
"result" : [
{
"_id" : {
"SpId" : 840,
"Scheduler_Id" : 1,
"Date" : ISODate("2013-10-31T18:30:00Z")
},
"avgSentiment" : 2.2700000000000005
},
{
"_id" : {
"SpId" : 840,
"Scheduler_Id" : 1,
"Date" : ISODate("2013-10-31T18:30:00Z")
},
"avgSentiment" : 4.96
}
}
The query I attempted :
db.Twitter_Processed.aggregate({$match : { "_id.SpId" : 840, "_id.Scheduler_Id" : 1 }},{$project:{SpId : "$_id.SpId",Scheduler_Id : "$_id.Scheduler_Id",day:{$dayOfMonth:'$PostDate'},month:{$month:'$PostDate'},year:{$year:'$PostDate'}, senti : "$Sentiment"}}, {$group : {_id : {SpId : "$SpId", Scheduler_Id : "$Scheduler_Id",day:'$day',month:'$month',year:'$year'}, sentiment : { $sum : "$senti"}}}, {$group : {_id : "$_id" , avgSentiment : {$avg : "$sentiment"}}}, {$project : {_id : {SpId : "$_id.SpId",Scheduler_Id : "$_id.Scheduler_Id", date : new Date("$_id.year","$_id.month","$_id.day")}, avgSentiment : "$avgSentiment"}})
Output(error) :
Error: Printing Stack Trace
at printStackTrace (src/mongo/shell/utils.js:37:15)
at DBCollection.aggregate (src/mongo/shell/collection.js:897:9)
at (shell):1:22
Tue Dec 31 09:41:42.916 JavaScript execution failed: aggregate failed: {
"errmsg" : "exception: disallowed field type Date in object expression (
at 'date')",
"code" : 15992,
"ok" : 0
} at src/mongo/shell/collection.js:L898
How do I achieve Step-2 ?
Upvotes: 1
Views: 524
Reputation: 65353
As you've noticed, the Aggregation Framework (as at MongoDB 2.4) has operators to extract parts of dates but not to easily create date fields.
There's a great blog post on Stupid date tricks with Aggregation Framework that provides a creative workaround: truncate the date granularity using $project
before you $group
:
db.Twitter_Processed.aggregate(
// Match (can take advantage of suitable index)
{ $match : {
"_id.SpId" : 840,
"_id.Scheduler_Id" : 1
}},
// Extract h/m/s/ms values from PostDate for rounding
{ $project: {
SpId : "$_id.SpId",
Scheduler_Id : "$_id.Scheduler_Id",
PostDate : "$PostDate",
h : { "$hour" : "$PostDate" },
m : { "$minute" : "$PostDate" },
s : { "$second" : "$PostDate" },
ms : { "$millisecond" : "$PostDate" },
senti : "$Sentiment"
}},
// Subtract the h/m/s/ms values to round the date off to yyyy-mm-dd
{ $project: {
SpId : "$_id.SpId",
Scheduler_Id : "$_id.Scheduler_Id",
// PostDate will end up truncated to yyyy-mm-dd granularity
PostDate: {
"$subtract" : [
"$PostDate",
{
"$add" : [
"$ms",
{ "$multiply" : [ "$s", 1000 ] },
{ "$multiply" : [ "$m", 60, 1000 ] },
{ "$multiply" : [ "$h", 60, 60, 1000 ]}
]
}
]
},
senti: "$Sentiment"
}},
{ $group : {
_id : {
SpId : "$SpId",
Scheduler_Id : "$Scheduler_Id",
PostDate: "$PostDate"
},
sentiment : { $sum : "$senti"}
}},
{ $group : {
_id : "$_id" ,
avgSentiment : {$avg : "$sentiment"}
}}
)
Upvotes: 3