Reputation: 363
Consider a collection consisting of documents of the following form:
{
"_id" : ObjectId("55f3600da9fb6e4f937a50a7"),
"timestamp" : ISODate("2010-01-01T08:10:00Z"),
"temperature" : 12.31
}
{
"_id" : ObjectId("55f3600da9fb6e4f937a50a8"),
"timestamp" : ISODate("2010-01-01T08:15:00Z"),
"temperature" : 12.48
}
...
I want to find the max temperature for each day of the year, and the time it occurred. The first part is easy enough with a pipeline aggregation:
[{"$group" : {"_id" : {"day": { "$dayOfYear": "$timestamp" }},
"max_temperature": {"$max" : "$temperature"}}},
{"$sort" : {"_id.day":1}}]
This gives me a nice result set with the max temperature for each day:
{u'max_temperature': 20.98, u'_id': {u'day': 1}}
{u'max_temperature': 24.15, u'_id': {u'day': 2}}
{u'max_temperature': 22.02, u'_id': {u'day': 3}}
...
But, how do I get the timestamp of when each daily maximum occurred? Something like:
{u'max_temperature': 20.98, u'time_of_max': ISODate("2010-01-01T15:11:12"), u'_id': {u'day': 1}}
{u'max_temperature': 24.15, u'time_of_max': ISODate("2010-01-02T16:03:42"), u'_id': {u'day': 2}}
{u'max_temperature': 22.02, u'time_of_max': ISODate("2010-01-03T16:33:59"), u'_id': {u'day': 3}}
...
Upvotes: 3
Views: 2427
Reputation: 2092
db.temp.aggregate([{$project:{date:{$dayOfYear:"$timestamp"},temp:"$temperatur e",ISODate:"$timestamp"}},{$group:{_id:"$date",maxtemp:{"$max":"$temp"},timestam p:{"$first":"$ISODate"}}},{"$sort":{_id:1}}])
Upvotes: 0
Reputation: 50406
Use $sort
first and the $first
operator instead of $max
. But the catch is you need the "day" detail reduction to be present first:
[
{ "$project": {
"day": { "$dayOfYear": "$timestamp" },
"timestamp": 1,
"temperature": 1
}},
{ "$sort": { "day": 1, "temperature": -1 } },
{ "$group": {
"_id" : "$day",
"max_temperature": { "$first": "$temperature" },
"timestamp": { "$first": "$timestamp" }
}},
{ "$sort": { "_id":1 } }
]
Once you have that projected value for "day" then the input can be ordered by day with the largest "temperature" value first. Then the $first
grouping operator will pick fields from the "first" document found on the grouping boundary.
So the "temperature" is the "maximum" because of the sort order, and the other fields will be from the same document where that value occurs.
Technically, this should still work:
[
{ "$sort": { "temperature": -1, "timestamp": 1 } },
{ "$group": {
"_id" : { "$dayOfYear": "$timestamp" },
"max_temperature": { "$first": "$temperature" },
"timestamp": { "$first": "$timestamp" }
}},
{ "$sort": { "_id":1 } }
]
But as the initial sort order is actually misaligned with the grouping key, then it is likely to loose in overall efficiency.
Upvotes: 2