Nick Parsons
Nick Parsons

Reputation: 8607

Calculating multiple averages using MongoDB Aggregation

I've been tasked with generating averages for day, week, month, and year for a rather large set of documents in MongoDB.

All of the jobs have a created field, and I need to base the average values off of the outputs array...

Here's what a document looks like:

{
  __v: 0,
  _id: ObjectId("535837911393fd0200d8e1eb"),
  created: ISODate("2014-04-23T21:58:41.446Z"),
  output: [
    {
      ref: {
        img: false
      },
      type: "image/png",
      methods: [
        {
          options: {
            height: 200,
            width: 200
          },
          method: "resize"
        }
      ]
    },
    {
      ref: {
        img: false
      },
      type: "image/png",
      methods: [
        {
          options: {
            height: 400,
            width: 400
          },
          method: "resize"
        }
      ]
    }
  ]
}

And here is what my current script looks like:

JobModel.aggregate([
    {
        $unwind: '$output'
    },
    {
        $group: {
            _id: { $dayOfYear: '$created' },
            day: { $sum: 1 }
        }
},
{
    $group: {
        _id: null,
        avgDay: { $avg: '$day' }
    }
},
{
        $project: {
            _id: 0,
            average: {
                day: '$avgDay'
            }
        }
    }
],
function(err, data) {

    if (err) {
        console.log(err);
        return;
    }

    res.send(data);
    next();

});

I cannot seem to figure out the right order for this. Any suggestions?

Upvotes: 1

Views: 1941

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151112

Really not that sure what you are after here. You say that you want "multiple" averages but that brings up the question of "muliple" over what basis? The average "output" entries over a individual day would be different from the average output entries per month or even per daily average per month. So the scale changes with each selection and is not really a single query for "daily", "monthly" and "yearly"

I would seem that you really was "discrete" totals which would be best approached by first finding the "size" of the output entries and then applying an average per scale:

JobModel.aggregate(
    [
        { "$unwind": "$output" },

        // Count the array entries on the record
        { "$group": {
            "_id": "$_id",
            "created": { "$first": "$created" },
            "count": { "$sum": 1 }
        }},

        // Now get the average per day
        { "$group": {
            "_id": { "$dayOfYear": "$created" },
            "avg": { "$avg": "$count" }
        }}
    ],
    function(err,result) {

    }
);

Or actually with MongoDB 2.6 and greater you can just use the $size operator on the array:

JobModel.aggregate(
    [
        // Now get the average per day
        { "$group": {
            "_id": { "$dayOfYear": "$created" },
            "avg": { "$avg": { "$size": "$output" } }
        }}
    ],
    function(err,result) {

    }
);

So the logical thing is to run each of those within your required $match range other your aggregation key of either "day", "month" or "year"

You could do something like combining the daily averages per day, with the daily average per month and then daily for year by combining results into arrays, otherwise you would just be throwing items away, which can be alternately done if you "just" wanted the daily average for the year, but as full results:

JobModel.aggregate(
    [
        // Now get the average per day
        { "$group": {
            "_id": { 
                "year": { "$year": "$created" },
                "month": { "$month": "$created" },
                "day": { "$dayOfYear": "$created" }
            },
            "dayAvg": { "$avg": { "$size": "$output" } }
        }},

        // Group for month
        { "$group": {
            "_id": {
                "year": "$_id.year",
                "month": "$_id.month"
            },
            "days": { 
                "$push": {
                    "day": "$_id.day",
                    "avg": "$dayAvg"
                }
            },
            "monthAvg": { "$avg": "$dayAvg" }
        }},

        // Group for the year
        { "$group": {
            "_id": "$_id.year",
            "daily": { "$avg": "$monthAvg" },
            "months": {
                "$push": {
                    "month": "$_id.month",
                    "daily": "$monthAvg",
                    "days": "$days"
                }
           }
        }}
    ],
    function(err,result) {

    }
);

However you want to apply that, but the main thing missing from your example is finding the "size" or "count" of the original "output" array per document from which to obtain an average.

Upvotes: 1

Related Questions