Reputation: 993
We have a requirement where we have a relational database table T1 with 20 fields. We capture all changes/updated in this table happening on various fields (commit logs) and ingest/apply those in corresponding table CT1 in Cassandra, i.e. Cassandra table CT1 has exact same schema/fields as T1 (relational DB table).
For Cassandra table CT1 we have additional requirement that we want to capture/store/retrieve all changed values of all fields meaning if Field f1 changed 20 times all its changed-values with the corresponding change-timestamp should be saved. Similarly, if Field f3 changed 100 times all its values should be saved. Note: different fields change at different times and each field changes the variable number of times, meaning one field may change 1000 times a day while some other field may never change at all.
This is some kind of time-series data for each field. So I want to know how to represent such data model efficiently in Cassandra? Another requirement is I want to efficiently retrieve the most recent value of all fields in the table.
For example:
if f1 changed 10 times in a day, for f1 I want its most recent value to be returned. If f2 changed most recently a week back then for f2 that most recent value should be returned, so on for other fields.
Upvotes: 0
Views: 1452
Reputation: 810
You can find more details about time-series storage in Cassandra by reading this and this articles. So to find the most recently values easily, you can create the table structure as following:
CREATE TABLE table1_history (
column_name text,
change_time timestamp,
column_value text,
PRIMARY KEY (column_name, change_time),
) WITH CLUSTERING ORDER BY (change_time DESC)
So you can find the latest value with a common selection data from your table.
Upvotes: 1