Baklap4
Baklap4

Reputation: 4202

MongoDB Aggregation AVG()

Hey Guys i'm really a new one if it comes to aggregation so please help me through this.

Let's say I have multiple documents (over time) like this:

{
  "_id": ObjectId("574d6175da461e77030041b7"),
  "hostname": "VPS",
  "timestamp": NumberLong(1460040691),
  "cpuCores": NumberLong(2),
  "cpuList": [
    {
      "name": "cpu1",
      "load": 3.4
    },
    {
      "name": "cpu2",
      "load": 0.7
    }
  ]
},
{
  "_id": ObjectId("574d6175da461e77030041b7"),
  "hostname": "VPS",
  "timestamp": NumberLong(1460040700),
  "cpuCores": NumberLong(2),
  "cpuList": [
    {
      "name": "cpu1",
      "load": 0.4
    },
    {
      "name": "cpu2",
      "load": 6.7
    }
  ]
},
{
  "_id": ObjectId("574d6175da461e77030041b7"),
  "hostname": "VPS",
  "timestamp": NumberLong(1460041000),
  "cpuCores": NumberLong(2),
  "cpuList": [
    {
      "name": "cpu1",
      "load": 25.4
    },
    {
      "name": "cpu2",
      "load": 1.7
    }
  ]
}

I'd like to get the average cpu load over X time. Where X is equal to 300 seconds.

So with the example above one would get a resultset which looks like this:

{
    "avgCPULoad": "2.8",
    "timestamp": NumberLong(1460040700)
},
{
    "avgCPULoad": "13.55",
    "timestamp": NumberLong(1460041000)
}

avgCpuLoad is calculated like this:

  1. Grab all documents within 300 seconds of eachother
  2. Calculate average:
    1. (((3.4+0.7)/2)+((0.4+6.7)/2))/2 = 2.8
    2. ((25.4+1.7)/2) = 13.55
  3. Add Last Timestamp from the selected documents.

I know how i'm getting every document for every x time. That's done like this:

db.Pizza.aggregate(
[
    {
        $group: 
        {
            _id:
            {
                $subtract: [
                    '$timestamp',   
                    {
                        $mod: ['$timestamp', 300]
                    }
                ]
            },
            'timestamp': {$last:'$timestamp'}
        },
    {
        $project: {_id: 0, timestamp:'$timestamp'}
    }
])

But how would one get the averages calculated like above? I've tried a bit with $unwind but not giving the results i'd like..

Upvotes: 1

Views: 754

Answers (2)

chridam
chridam

Reputation: 103305

You need to run the following aggregation operation to get the desired result:

db.collection.aggregate([ 
    { "$unwind": "$cpuList" },
    {
        "$group": {
            "_id": {                
                "interval": {
                    "$subtract": [ 
                        "$timestamp",                        
                        { "$mod": [ "$timestamp", 60 * 5 ] }
                    ]
                }
            },             
            "avgCPULoad": { "$avg": "$cpuList.load" },
            "timestamp": { "$max": "$timestamp" }
        } 
    },
    {
        "$project": { "_id": 0, "avgCPULoad": 1, "timestamp": 1 }
    }
])

The above groups the flattened documents by a 5 minute interval (depicted in seconds); the interval key is deduced by subtracting the timestamp in seconds from the remainder you get when the actual timestamp is divided by the 5 minutes interval (in seconds)

Sample Output

/* 1 */
{
    "avgCPULoad" : 13.55,
    "timestamp" : NumberLong(1460041000)
}

/* 2 */
{
    "avgCPULoad" : 2.8,
    "timestamp" : NumberLong(1460040700)
}

Upvotes: 1

Jan Doornbos
Jan Doornbos

Reputation: 751

The solution to this is to use unwind on the array (cpulist). I've made an example query for you:

db.CpuInfo.aggregate([
    {
        $unwind: '$cpuList'
    },
    {
        $group: {
            _id:{
                $subtract:[
                    '$timestamp', 
                    {$mod: ['$timestamp', 300]}
                ]
            }, 
            'timestamp':{$last:'$timestamp'},
            'cpuList':{$avg:'$cpuList.load'}
        }
    }
])

Upvotes: 1

Related Questions