opcode
opcode

Reputation: 469

MongoDB aggregation pipeline with loop

I am having this aggregation pipeline code below that I would like to run for every day of the year! Essentially calculating the minimum, maximum and average temperature ("TEMP" field) for every day of the year. At the moment I am calling this piece of code 365 times, passing the start date and the end date of a day.

Obviously this is very inefficient. Is there any way to loop this within mongo so that its faster, and return an array of 365 average values, 365 min values and 365 max values or something like that. Im using a timezone library to derive the start date and end date.

collection.aggregate([
    {
    $match:{$and:[
        {"UID"  : uid},
        {"TEMP" :{$exists:true}}
        {"site" : "SITE123"},
        {"updatedAt": {$gte : new Date(START_DATE_ARG), $lte : new Date(END_DATE_ARG)} }
        ]}
    },

    { "$group": {
        "_id": "$UID",
        "avg": { $avg: $TEMP },
        "min": { $min: $TEMP },
        "max": { $max: $TEMP }
        }
    }
], function(err, result){
                if (err){
                     cb(1, err);
                }
                else{
                    cb(0, result);
                }
            });
});

The datasets look like this

....
{UID: "123", TEMP: 11, site: "SITE123", updatedAt: ISODate("2014-09-12T21:55:19.326Z")}
{UID: "123", TEMP: 10, site: "SITE123", updatedAt: ISODate("2014-09-12T21:55:20.491Z")}
....

Any ideas? Maybe we can pass all the timestamps of all the days of the year in the aggregation pipeline?

Thank you!!

Upvotes: 1

Views: 8058

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151082

Why run this for every day when you can simply make the date part of the grouping key? This is what the date aggregation operators exist for, so you can aggregate by time frames in a whole period at once without looping:

collection.aggregate([
    { "$match":{
        "UID": uid,
        "TEMP":{ "$exists": true }
        "site" : "SITE123",
        "updatedAt": {
            "$gte": new Date(START_DATE_ARG), 
            "$lte": new Date(END_DATE_ARG)
        }}
    }},

    { "$group": {
        "_id": { 
            "uid": "$UID",
            "year": { "$year": "$updatedAt" },
            "month": { "$month": "$updatedAt" },
            "day": { "$dayOfMonth" }
        },
        "avg": { "$avg": "$TEMP" },
        "min": { "$min": "$TEMP" },
        "max": { "$max": "$TEMP" }
    }}
])

Or possibly just condensing the date to a timestamp value instead. A little trick of date math with date objects:

collection.aggregate([
    { "$match":{
        "UID": uid,
        "TEMP":{ "$exists": true }
        "site" : "SITE123",
        "updatedAt": {
            "$gte": new Date(START_DATE_ARG), 
            "$lte": new Date(END_DATE_ARG)
        }}
    }},

    { "$group": {
        "_id": { 
            "uid": "$UID",
            "date": {
                "$subtract": [
                    { "$subtract": [ "$updatedAt", new Date("1970-01-01") ] },
                    { "$mod": [
                        { "$subtract": [ "$updatedAt", new Date("1970-01-01") ] },
                        1000 * 60 * 60 * 24
                    ]}
                ]
            }
        },
        "avg": { "$avg": "$TEMP" },
        "min": { "$min": "$TEMP" },
        "max": { "$max": "$TEMP" }
    }}
])

Of course your "date range" here is now all of the dates you require to be in the results, so the start and the end dates for all the things where you intended to loop. The grouping is done in either case to reflect "one day", but of course you could change that to any interval you want to.

Also note that your use of $and here is not necessary. Queries in MongoDB "and" conditions by default. The only time you need that operator is for multiple conditions on the same field that would otherwise not be valid JSON/BSON.

Upvotes: 1

Related Questions