ethrbunny
ethrbunny

Reputation: 10469

Cassandra: making a data model / schema

(Not sure what its called... model.. schema.. super model?)

I have 'n' (uniquely id'd) sensors in 'm' (uniquely id'd) homes. Each of these fires 0 to 'k' times / day (in blocks of 1-5). This data is currently stored in MySQL with a table for each 'home' and a structure of:

time stamp
sensor id
firing count

Im having trouble wrapping my mind around a 'nosql' model of this data that would allow me to find counts of firings by home, time, or sensor.

.. Or maybe this isn't the right kind of data to push to nosql? Our current server is bogging down under the load ( hundreds of millions of rows x hundreds of homes ). Im very interested in finding a data store that allows the scalability of cassandra.

Upvotes: 3

Views: 497

Answers (2)

libjack
libjack

Reputation: 6443

It depends. Think "Query first" approach:

  1. identify the queries
  2. model the data

So, while you might have a Column Family which is your physical model, you will also have one or more which provide the data as it is queried. And, you can further take advantage of Cassandra features, such as:

Column Names can contain data. You don't have to store a value, each of the names could be a timestamp, for example

It is well suited to store thousands of columns for each key and the columns will remain sorted and can access in forward or reverse order; so, to continue above example, can easily get list of all timestamps for a sensor

Composite data types allow you to combine multiple bits of data into keys, names, or values. e.g. combine house id and sensor id

Counter Columns provide an simple value increment, even for the initial value, so just always a write operation.

Indexes can be defined on static column names which in effect, provides a reverse Column Family with the key as the result, just be careful of bucket size (e.g. might not want values to millisec)

Upvotes: 2

rodrigoap
rodrigoap

Reputation: 7480

To store firing count by sensor and house:

House_Sensors     <-Column family 
  house_id        <-Key
    sensor_id     <-Column name
    firing_count  <-Column value

Data represented in JSON-ish notation

House_Sensors = {
 house_1 : {
  sensor_1: 3436,
  sensor_2: 46,
  sensor_3: 99,
  ...
 },
 house_2 : {
  sensor_7: 0,
  sensor_8: 444,
  ...
 },
 ...
}

You may want to define another column family with sensor_id as key to store the firing timestamp.

Think what queries you need when designing the schema and denormalize as needed. Repeat data, Cassandra inserts are very fast.

The timestamp of the firing is not stored in House_Sensor column family. Create a new column family for that with sensor_id as key.

This way you can use House_Sensor family to query firing count and what sensor belongs to each house. Use the other column family to query the firing timestamp.

Upvotes: 1

Related Questions