Data Model design approach for one use case in Cassandra

Question

I am in need of the best approach for the following use case,

I have 'Device' table (Only one Partition Id : 'Device') and I have another table 'DeviceStatistics' (Partition Id : 'deviceId' so that this table will have as many partition as number of devices) which means for every device there will be a statistics collected for every minute.

CREATE TABLE device(
   "partitionId" text,"name" text,"deviceId" text, ..., primary key ("partitionId","name","deviceId"));

where partitionId - it is a constant ('device')

CREATE TABLE deviceStatistics (    
"deviceId" text,     
"timestamp" timestamp, ...,
primary key ("deviceId","timestamp")) with clustering order by ("timestamp" DESC);

where 'deviceId' - it is the partition key and under each partition will have the list of timestamp entries

Till this it is fine, Because I just need the following queries,

1) select * from device where partitionId = 'device'
   - which list all the devices available. 
2) select * from deviceStatistics where deviceId = 'deviceId_1'
   - which list all the device statistics for a deviceId
3) select * from deviceStatistics where deviceId = 'deviceId_1' LIMIT 1
   - which gets the most recent statistics for a deviceId

Now I need the solution for the following use case,

I need to collect the cluster level statistics which means I need to collect all the device statistics for the timestamp,

(i.e) If the deviceStatistics for 4 devices are available for the timestamp then I need to collect all the four statistics for a timestamp and add in device group level.

which means my DeviceGroupstatistics is the aggregation of all the device statistics for the timestamp.

Now the problem is, Since I have 'deviceId' as the partitionId for the deviceStatistics table, I need to perform this query (select * from deviceStatistics where deviceId = 'deviceId' LIMIT 1) for all the deviceIds. So Lets say I have 1000 devices, then I need to trigger this query for all the 1000 devices for every minute.

Is there a better design for this?

Alex Ott · Accepted Answer

I would recommend to have separate table where timestamp will be the partition key, and device ID is clustering key. Granularity of timestamp may depend on your application - for example, drop seconds & round to minutes, or something like.

You can implement storing the data from your application (preferred), or you can use materialized views (but they are experimental, and not always recommended to use).

Data Model design approach for one use case in Cassandra

Answers (2)

Related Questions