How should I partition metering data in Cassandra?

Question

I have built an application receiving metering data (for example, the current temperature of a room) from multiple devices (in this example, multiple rooms).

I receive metering data every 15 minutes. My application calculates the difference between the current temperature and the previous one received and sends it to another application. I store the received metering data in a Cassandra cluster. (timestamp, temperature, device_id, room, ...)

Which field should I use for partitioning?

If I use the timestamp as the partition key will it put all load on the same node? (without regarding replication)?

If I use the device_id/room, won't I get an unbounded partition? Maybe I could add a retention period?

Manish Khandelwal · Accepted Answer

Rule for Cassandra data modeling is design your tables based on your queries. So prepare your queries first. For example if you have queries like

Get readings for room.
Get reading for device.

You can have two tables

READING_BY_ROOM (parition key room id)
READING_BY_DEVICE (partition key device id)

This is the only way you design tables in Cassandra. Dont try to create table RDBMS way.

How should I partition metering data in Cassandra?

Answers (1)

Related Questions