Reputation: 17082
I'm using MongoDB to handle timeseries, this is working fine as until now there is not too many data but I now need to identify what is needed to scale to a larger number of data. Today, there are +200k data received per day, each data received every couple of seconds, that is not huge but this should increase soon.
The data collection used is far from beeing efficient as each piece of data (parentID, timestamp, value) creates a document. I've seen several approaches that uses a document that keeps the timeseries for a whole hour (with, for instance, an inner array that keeps data for each seconds), this is really great but as the data I have to handle are not received regularly (depending upon the parentID), this approach might not be appropriate.
Among the data I receive:
- some are received every couple of seconds
- some are received every couple of minutes
For all those data, the step between 2 consecutive ones is not necessarily the same.
Is there a better approach I could use to handle those data, for instance using another modelisation, that could help to scale the DB ?
Today only one mongod process is running, and I'm wondering at which level the sharding might really be needed, any tips for this ?
Upvotes: 1
Views: 1006
Reputation: 631
The solution to your problem is very well captured here:
http://bluxte.net/musings/2015/01/21/efficient-storage-non-periodic-time-series-mongodb
Basic idea as already pointed out is: to have fixed number of events captured per document and keep a track record of the start and end time stamp of each document in another "higher-level" collection.
Upvotes: 2
Reputation: 11671
You may still be able to reap the benefit of having a preallocated document even if readings aren't uniformly distributed. You can't structure each document by the time of the readings, but you can structure each document to hold a fixed number of readings
{
"type" : "cookies consumed"
"0" : { "number" : 1, "timestamp" : ISODate("2015-02-09T19:00:20.309Z") },
"1" : { "number" : 4, "timestamp" : ISODate("2015-02-09T19:03:25.874Z") },
...
"1000" : { "number" : 0, "timestamp" : ISODate("2015-01-01T00:00:00Z") }
}
Depending on your use case, this structure might work for you and give you the benefit of updating preallocated documents with new readings, only allocating a brand new document every N
readings for some big N
.
Upvotes: 2