Nyan Maru
Nyan Maru

Reputation: 87

MongoDB, TimeSeries and Aggregation Framework

So, I’m working in this time series data with MongoDB and found this: http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

It says that this kind of structure wouldn’t take advantage of the “expressive nature of the document model”:

{
  timestamp: ISODate("2013-10-10T23:06:37.000Z"),
  type: ”memory_used”,
  value: 1000000
}

And they suggest something like this (minutes/second):

{
  timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"),
  type: “memory_used”,
  values: {
    0: { 0: 999999, 1: 999999, …, 59: 1000000 },
    1: { 0: 2000000, 1: 2000000, …, 59: 1000000 },
    …,
  }
}

So I decided to model my data like this, (year/month):

{
  "unity_of_measure": "MWh",
  "region": "North",
  "consumption": {
    2014: {1: 2568.652, 2: 2614.621, 3: 2711.012 } ,
    2015: {1: 2751.234, 2: 2752.231, 3: 2752.215 } ,
    …
  }
}

I don’t know if I got something wrong but it seems like the Aggregation Framework is not able to get max/min/sum/avg values, for example, with this kind of modeling.

So I changed my modeling to this:

{
  "unity_of_measure": "MWh",
  "region": "North",
  "consumption": [
    { "year": 2014,
      "month": 1,
      "value": 2568.652
    },
    { "year": 2014,
      "month": 2,
      "value": 2614.621
    }
    ...
  ]
}

But now I find myself not taking the advantage of the “expressive nature of the document model”. So I’m really confused about this.

Is there any way to use the aggregation framework with the modeling suggested by mongo’s blog? The idea here is not to use map reduce or something like this.

If it’s not possible, how bad is the second modeling when it comes to performance?

Sorry about the long question, I wish I could post a potato here.

Thanks in advance.

Upvotes: 1

Views: 141

Answers (1)

Steve Tarver
Steve Tarver

Reputation: 3228

I think you missed something important in the article -> they update summary statistic fields on every update so they can calculate average in the application. They are using $set on the time series data and $inc on num_samples and total_samples because document field updates are very cheap and allow them to reach the scale they are at. The time series data is used for display - not for calculations in mongo.

Aggregate and map functions operate on lists - your first example has none - it has a sub document with time series data. You could average in an aggregation pipeline but it would be very ugly and simpler to do in application code.

If you want to do your calculations in mongo, you have to organize your data in some kind of list, like your second example.

Using that second schema as a hint, here is how you could do the calculations. Open a mongo shell, use a scratch db like test, and paste this in to see it work.

// clean up from previous run
db.timeSeries2.drop();

// Given data like
db.timeSeries2.insert(
    {
      "unit_of_measure": "MWh",
      "region": "North",
      "consumption": [
        {
          "year": 2014,
          "month": 1,
          "value": 50
        },
        {
          "year": 2014,
          "month": 2,
          "value": 100
        },
        {
          "year": 2014,
          "month": 3,
          "value": 150
        },
        {
          "year": 2015,
          "month": 1,
          "value": 500
        },
        {
          "year": 2015,
          "month": 2,
          "value": 1000
        },
        {
          "year": 2015,
          "month": 3,
          "value": 1500
        }
      ]
    }
);

// aggregation to provide min/max/sum/average
db.timeSeries2.aggregate([
  {$match: {region: 'North', unit_of_measure: 'MWh'}},
  {$unwind: '$consumption'},
  {$match: {'consumption.year': 2014}},
  {
    $group: {
      _id: 'summary',
      avg: {$avg: '$consumption.value'},
      sum: {$sum: '$consumption.value'},
      min: {$min: '$consumption.value'},
      max: {$max: '$consumption.value'}
    }
  }
])

produces

{
  "result": [
    {
      "_id": "summary",
      "avg": 100,
      "sum": 300,
      "min": 50,
      "max": 150
    }
  ],
  "ok": 1
}

Upvotes: 1

Related Questions