MongoDB schema design for a measurement acquisition system

Question

Introduction:

System currently consists of 2 devices.
Each device has 10 nodes that measure data. That data is written to DB each 5 seconds.
I have estimated the maximum 50:1 (read:write) ratio for that setup for now. This is very likely to change when new devices/nodes are introduced.
I'm currently embedding everthing in one document (example here: http://pastebin.com/4dATY5NF)
My 3 main use-cases are:
- adding measurement to the DB
- getting the last measurement from all nodes (for 5 nodes this would return 5 measuremnets)
- getting a list of measurements from a given day (long list of measurements matching input date/time criteria).

Problem:

My main concern is about the documents that grow a lot over time (inserting to embedded array of measurements) and the general document structure that makes the measurements hard to query for a given date/time range.

E.g. Even if there was only one node reporting data each 5 seconds, then the total number of measurements in embedded array (only for one day) is: 24*60*60/5=17280. Having 5 nodes reporting for a month gives: 5 embedded arrays with 518400 elements (in one document!). The longer the device works, the more entries it has in embedded array of measurements for each node attached.

Questions:

How does estimated read/write ratio influence decision of embedding vs linking?
Is it justified in this case to sacrifice all the good things of embedding and split the data into 2 collections?

What I have been thinking of is e.g. one collection for device/node configuration (embedding information here since there isn't much of it), and the second only for measurements (with references to the device and node it came from). I think that this will cost a few calls to the DB more, but will be better in terms of performance and memory usage.

Remon van Vliet · Accepted Answer

In order :

It doesn't. Embedding an infinitely growing structure in a single document does not scale and should be avoided. It is preferable by far to store each measurement as a single document. The read/write ratio is not very relevant once you go for that although write performance will be more stable (MongoDB has to move growing documents regularly which can cause write latency spikes).
There are actually not a lot of "good things" about embedding. It complicates querying, there's no way to get a small part of the embedded structure and so forth. As such it is not only justified but highly encouraged to move to two seperate collections. In future proof schemas you embed if, and only if, you always need the entire embedded structure if you query the top level document and if that embedded structure is size bound regardless of how many users or data your system has to deal with.

MongoDB schema design for a measurement acquisition system

Introduction:

Problem:

Questions:

Answers (1)

Related Questions