Coder
Coder

Reputation: 3262

Normalization in Cassandra for the specified use case?

I have a device table (say 'device' table) which has the static fields with current statistics and I have another table (say 'devicestat' table) which has the statistics of that device collected for every one minute and sorted by timestamp like below.

Example :

CREATE TABLE device(
   "partitionId" text,
   "deviceId" text,
   "name" text,
   "totalMemoryInMB" bigint,
   "totalCpu" int,
   "currentUsedMemoryInMB" bigint,
   "totalStorageInMB" bigint,
   "currentUsedCpu" int,
   "ipAddress" text,
    primary key ("partitionId","deviceId"));


CREATE TABLE devicestat(
   "deviceId" text,
   "timestamp" timestamp,
   "totalMemoryInMB" bigint,
   "totalCpu" int,
   "usedMemoryInMB" bigint,
   "totalStorageInMB" bigint,
   "usedCpu" int
    primary key ("deviceId","timestamp"));

where,

currentUsedMemoryInMB & currentUsedCpu => Hold the most recent statistics

usedMemoryInMB & usedCpu => Hold the most and also old statistics based on time stamp.

Could somebody suggest me the correct approach for the following concept?

So whenever I need static data with the most recent statistics I read from device table, Whenever I need history of device staistical data I read from the devicestat table

This looks fine for me, But only problem is I need to write the statitics in both table, In case of devicestat table It will be a new entry based on timestamp but In case of device table, we will just update the statistics. What is your thought on this, Does this need to be maintained in only the single stat table or Is it fine to update the most recent stat in device table too.

Upvotes: 1

Views: 140

Answers (1)

Alex M981
Alex M981

Reputation: 2374

in Cassandra the common approach is to have a table(ColumnFamily) per query. And denormalization is also a good practice in Cassandra. So it's ok to keep 2 column families in this case.

Another way to get the latest stat from devicestat table is make data be DESC sorted by timestamp:

CREATE TABLE devicestat(
   "deviceId" text,
   "timestamp" timestamp,
   "totalMemoryInMB" bigint,
   "totalCpu" int,
   "usedMemoryInMB" bigint,
   "totalStorageInMB" bigint,
   "usedCpu" int
    primary key ("deviceId","timestamp"))
WITH CLUSTERING ORDER BY (timestamp DESC);

so you can query with limit 1 when you know deviceId

select * from devicestat where deviceId = 'someId' limit 1;

But if you want to list last stat of devices by partitionId then your approach with updating device table with latest stat is correct

Upvotes: 1

Related Questions