pherris
pherris

Reputation: 17743

mongodb time series data modeling

Newbe question on how to query MongoDB. I want to take multiple objects received throughout each day and aggregate them together into a single day record. The document structure will eventually contain 365 days worth of data (per year, with a new document per year possibly) in the following format:

{
"2012": {
    "9": {
        "1": {
            "attribute1": 42.1,
            "attribute2": 5.895,
            "attribute3": 2347,
            "attribute...n": 33076.875,
            "startFuel": [
                32976.75,
                33018.875,
                33041.375
            ],
            "startOdometer": [
                202748.9,
                202973,
                203100.2
            ],
            "endOdometer" : 209855.6
        },
        "10": {
            "attribute1": 363.90000000000003,
            "attribute2": 59.242000000000004,
            "attribute3": 20284,
            "attribute...n": 34335.375,
            "startFuel": [
                34161,
                34214.125,
                34245.25,
                34280.875
            ],
            "startOdometer": [
                209855.6,
                210166.4,
                210348.2,
                210555.7
            ],
            "endOdometer" : 210655.7
        }
    }
},
"_id": ObjectId("50b6838a6ef6585a9e51f2af"),
"key1": null,
"keu2": -1,
"key3": 572755
}

As you can see, each document belongs to one entity, a year contains each month and each month has each day (as long as there is data for that day/month/year).

I am struggling with how to query this document format. I'd like to use the aggregation framework to average values within a document (for a single entity) and to aggregate them across multiple documents/entities over the same time period.

For example I'd like to find the average miles driven each day in the month of august (for each entity, for each day in the month, subtract the lowest startOdometer from the endOdometer and average the values over the month). (BTW: the startOdometer is an array to allow me to push the values without pre-querying the document to see if it is already present).

We're having a debate internally whether we need to use map/reduce to create ALL of the aggregate views into our data or if we can use map/reduce to create the day summaries and use the aggregation framework to pull the data together. I would like to know how this data structure would support that type of query, or what other data structure might be more appropriate to allow us to leverage the aggregation framework to summarize our data.

Upvotes: 1

Views: 679

Answers (1)

JohnnyHK
JohnnyHK

Reputation: 312129

A dynamic schema like this is very hard to query. You'd be much better off reworking your schema to use fixed key names and leave the dynamic content to the values.

Upvotes: 1

Related Questions