How to aggregate time series data in MongoDB

Question

I have MongoDB documents like the ones explained here https://www.mongodb.com/blog/post/schema-design-for-time-series-data-in-mongodb

So 1 document for each day (and type and system) with a values field that contains hours, minutes and seconds data, so like this:

{ 
    "_id" : ObjectId("59fc57d75bc7315366b78799"), 
    "date" : ISODate("2017-11-03T00:00:00.000+0000"), 
    "system" : "192-168-1-30", 
    "type" : "memory", 
    "values" : {
        [...]
        "11" : { // hour 11
            [...]
            "49" : { // minute 49
                [...]
                "43" : NumberInt(62171000), // second 43
                "44" : NumberInt(62169000),
                [...]
            }, 
            "50" : {
                "1" : NumberInt(62363000), 
                "2" : NumberInt(62319000)
                [...]
            },
            [...]
        },
        [...]
    }, 
    "updatedAt" : ISODate("2017-11-03T13:34:00.720+0000"), 
    "createdAt" : ISODate("2017-11-03T11:49:43.442+0000")
}

Here for example at 11:49:43 of 2017-11-03 memory was at 62171000.

Now I'm trying to get aggregated data of these documents to obtain rows with average data for each minutes, hours and so on but I'm confused how to tell to the aggregation framework that $values is an array of hours, minutes and seconds.

Or should I use map/reduce?

Any hints?

Alex Blex · Accepted Answer

You are missing essential point of the article which basically describes a system with pre-aggregated data:

The fields “num_samples” and “total_samples” are updated as new readings are applied to the document:

{$set: {“values.59”: 2000000 }},
{$inc: {num_samples: 1, total_samples: 2000000 }}

So that each document contains number of values and total of values per document. So if you added these 2 fields to your document you could easily calculate daily average by dividing total_values by number_values.

The article is quite old. From one hand Mongodb significantly evolved since then, from the other hand projects mentioned there got abandoned. I am particularly sorry for the destiny of Square Cube one.

Starting from v3.4 you can benefit from objectToArray to achieve runtime grouping as following:

db.collection.aggregate([
    {$project:{        
        date : 1, system : 1, type : 1,
        hour: {$objectToArray: "$values" } 
    }},
    {$unwind: "$hour"},    
    {$project:{ 
        date : 1, system : 1, type : 1,
        hour: "$hour.k",
        minute: {$objectToArray: "$hour.v" } 
    }},
    {$unwind: "$minute"},    
    {$project:{ 
        date : 1, system : 1, type : 1, hour: 1,
        minute: "$minute.k",
        second: {$objectToArray: "$minute.v" } 
    }},
    {$unwind: "$second"},    
    {$project:{ 
        date : 1, system : 1, type : 1, hour: 1, minute: 1,
        second: "$second.k",
        value: "$second.v"
    }},
])

Which gives you 1 document per second in following format:

{
    "_id" : ObjectId("59fc57d75bc7315366b78799"),
    "date" : ISODate("2017-11-03T00:00:00.000Z"),
    "system" : "192-168-1-30",
    "type" : "memory",
    "hour" : "11",
    "minute" : "49",
    "second" : "43",
    "value" : 62171000
}

You can apply aggregation stages after that to group by seconds, minutes, hours.

The problem is it is quite expensive, and the fact that hours, minutes, and seconds are strings, only makes it more complex to manipulate.

It would be way simpler to store 1 document per second in the first place.

How to aggregate time series data in MongoDB

Answers (1)

Related Questions