What is the most effective way to retain data for longer in Cassandra

I'm looking for an effective way to retain data longer in Cassandra.

Ex: I want to retain last 2 years worth of data for real time query purpose and 2 years prior to that would not be queried but need to retain for audit purpose in need.

Is there a way to achieve that?

Looking forward to real options if any.

Version: Cassandra 3.11 / 4.0

Upvotes: 0

Answers (1)

Hakan

Reputation: 125

Without knowing your access pattern (specifically primary key) and write pattern (batch/streaming), below several suggestions to solve your problem :

1/ Historicised table with TTLs

Design a table who can store data by temporal keys and apply a TTL of 4 years for each row written

For audit purposes data (more than 2 years), handle date to block restitution in your back end API

Suggested PK+CC : ((year, month, day) , dataId)

Notice that TTL is at Column Level (See this medium article)

2/ Historicised table with scheduled purge

Same of previous suggestion, but use a scheduled job instead of TTL to purge data greater than 4 year.

To achieve that, use Spark Cassandra Connector to extract historicised table from Cassandra, identify rows to delete and purge data in Cassandra

I personally recommend usage of Dataframe API over RDDs (Official GitHub project)

Upvotes: 1

What is the most effective way to retain data for longer in Cassandra

Answers (1)

Related Questions