Reputation: 506
I'm looking for an effective way to retain data longer in Cassandra.
Ex: I want to retain last 2 years worth of data for real time query purpose and 2 years prior to that would not be queried but need to retain for audit purpose in need.
Looking forward to real options if any.
Version: Cassandra 3.11 / 4.0
Upvotes: 0
Views: 50
Reputation: 125
Without knowing your access pattern (specifically primary key) and write pattern (batch/streaming), below several suggestions to solve your problem :
1/ Historicised table with TTLs
Design a table who can store data by temporal keys and apply a TTL of 4 years for each row written
For audit purposes data (more than 2 years), handle date to block restitution in your back end API
Suggested PK+CC : ((year, month, day) , dataId)
Notice that TTL is at Column Level (See this medium article)
2/ Historicised table with scheduled purge
Same of previous suggestion, but use a scheduled job instead of TTL to purge data greater than 4 year.
To achieve that, use Spark Cassandra Connector to extract historicised table from Cassandra, identify rows to delete and purge data in Cassandra
I personally recommend usage of Dataframe API over RDDs (Official GitHub project)
Upvotes: 1