Reputation: 10469
(Not sure what its called... model.. schema.. super model?)
I have 'n' (uniquely id'd) sensors in 'm' (uniquely id'd) homes. Each of these fires 0 to 'k' times / day (in blocks of 1-5). This data is currently stored in MySQL with a table for each 'home' and a structure of:
time stamp
sensor id
firing count
Im having trouble wrapping my mind around a 'nosql' model of this data that would allow me to find counts of firings by home, time, or sensor.
.. Or maybe this isn't the right kind of data to push to nosql? Our current server is bogging down under the load ( hundreds of millions of rows x hundreds of homes ). Im very interested in finding a data store that allows the scalability of cassandra.
Upvotes: 3
Views: 497
Reputation: 6443
It depends. Think "Query first" approach:
So, while you might have a Column Family which is your physical model, you will also have one or more which provide the data as it is queried. And, you can further take advantage of Cassandra features, such as:
Column Names can contain data. You don't have to store a value, each of the names could be a timestamp, for example
It is well suited to store thousands of columns for each key and the columns will remain sorted and can access in forward or reverse order; so, to continue above example, can easily get list of all timestamps for a sensor
Composite data types allow you to combine multiple bits of data into keys, names, or values. e.g. combine house id and sensor id
Counter Columns provide an simple value increment, even for the initial value, so just always a write operation.
Indexes can be defined on static column names which in effect, provides a reverse Column Family with the key as the result, just be careful of bucket size (e.g. might not want values to millisec)
Upvotes: 2
Reputation: 7480
To store firing count by sensor and house:
House_Sensors <-Column family
house_id <-Key
sensor_id <-Column name
firing_count <-Column value
Data represented in JSON-ish notation
House_Sensors = {
house_1 : {
sensor_1: 3436,
sensor_2: 46,
sensor_3: 99,
...
},
house_2 : {
sensor_7: 0,
sensor_8: 444,
...
},
...
}
You may want to define another column family with sensor_id as key to store the firing timestamp.
Think what queries you need when designing the schema and denormalize as needed. Repeat data, Cassandra inserts are very fast.
The timestamp of the firing is not stored in House_Sensor column family. Create a new column family for that with sensor_id as key.
This way you can use House_Sensor family to query firing count and what sensor belongs to each house. Use the other column family to query the firing timestamp.
Upvotes: 1