Jiechao Li
Jiechao Li

Reputation: 366

Mongo Schema Design time series data

I am working on a web application using MongoDB,and I have some questions about the schema design.

What I want to do is to use Mongo store energy consumption data for each user. For each user, we will have the data for electricity consumption, which is a time stamp and consumption.

So the questions is how to store them in Mongo, and I have two ways of doing it.

  1. Put everything in one Collection. So it will have like this:

    {"user_id": "e211a233-808f-fc43-0800-c05650001785","Value": 274,"Time": 1314691200}

    So, each user may have thousands of data, and we have thousands of users. So there would be tens of millions documents in one collection.

  2. Put the data of one user in one Collection. So we will have thousands of collections and thousands of document in each collection.

Can anyone help me which approach is better considering performance?

Upvotes: 2

Views: 2197

Answers (3)

vizog
vizog

Reputation: 311

For any new refers for this question:

mongoDB has some very useful video tutorials regarding this specific problem. see the following links and it will help you for sure:

part 1

part 2

part 3

Upvotes: 2

cirrus
cirrus

Reputation: 5662

Option 1 will leverage your indexes and scale out well. It'll be much easier to query and update effieciently than massive documents that are always changing. It will also make your queries much easier if you plan on aggregating that data in the future. Specifically, using the Aggregation Framework over documents is much more efficient than arrays within documents which have to be unwound first.

Also, if you planned on having in the region of 150K entries like that, it would exceed the 16MB single document limit, so I think you're almost always better off with single documents in a big collection as per option 1.

[Update]

Looking again, I see that you you haven't mentioned what queries you would make over the data. That's the key to this. But given that it looks as if your results are historical It looks more and more towards putting the data into millions of documents. Map-Reduce would be your friend here for analysis.

Upvotes: 1

snez
snez

Reputation: 2480

You can do with option 1 and also shard your data across multiple nodes for performance.

Alternatively if it's an option, I would personally keep a daily entry for each user and then use

db.coll.update( 
  { _id : userId, date: '12/11/2012' }, 
  { $inc : {  consumption : value } },
  true // insert the document if it does not exist and init consumption with 0
)

If you will not be querying the data too often, you can also add the entries in a single daily document in a day collection like so:

db.days.update( { day: '12/11/2012' } , 
  { $addToSet : 
    { todaysConsumptions : { userId : id, consumption: value, time: timestamp } 
  } 
}

The way to query data from this last method would be to use the aggregation framework with the $unwind operation on the todaysConsumptions field. $unwind essentially converts an embedded array field into column like data which can then be grouped, summed, counted etc.

Upvotes: 1

Related Questions