Radhika
Radhika

Reputation: 423

Cassandra Old data deletion

On cassandra, we only need 100 days of data for specific tables. However, we only recently set the TTL value and the data older than that still stays in the system as stale data. We were thinking of different approaches to delete the old data out of the system. One suggestion was to create a Spark job to identify the data older than a specific timeframe and delete them all. Another thought was to create a new table with just 100 days data and delete the old table. But I have various doubts on

  1. how to rename the table where live data is being updated,
  2. how will cassandra deal with such a table? While I have recreated a new table with less data and renamed it on one node(say node 1), will the other nodes in the cluster automatically delete the older data in their tables or sync the table on the node 1 and push all the older data onto it?

I am really new to cassandra and require expert advice on this. Please suggest if there are better ways to handle this.

Upvotes: 1

Views: 211

Answers (1)

Carlos Monroy Nieblas
Carlos Monroy Nieblas

Reputation: 2283

Cassandra does not have a way to rename a table, you will need to

  1. create the new table with a different name
  2. ensure this table has the TTL clause
  3. load into it only the subset of records that you are interested on; this could be tricky as the query will depend on the schema of the table, is the column with the timestamp part of the clustering key?
  4. update your application to point to the new table
  5. drop the table

Upvotes: 1

Related Questions