Manuel
Manuel

Reputation: 9522

Cassandra Replication Strategy based on Primary Key for Archiving Data Old Data

I'm thinking in the manner of storying iot telemetry data.

I'd like to optimize my storage. In this case let us take iot telemetry as an example. I'd like to keep recent data (eg last 6 months) as hot and highly replicated. For older data I'd like to reduce replicas and/or fully offload to a lower performance archive cluster.

I know keyspace based replication strategies. However this would mean I'd require multiple keyspaces. However I'd like to rather have the replication based on the primary-key / shard-key based solution.

Is it possilbe to define replication strategy based on data age or any other property?

If yes, how can this be achieved?

Thanks a lot in advance for your expertise.

Upvotes: 0

Views: 54

Answers (1)

Mark Allen
Mark Allen

Reputation: 1

in general Cassandra not provide such functionality of the box. You are right replication strategies applies per keyspace, so you need implement some external job for read and write data to another keyspace. Also cassandra not effective database for store timeseries on disk, so much more better have something like parquet file as cold storage. Usually this looks likes this:

Cold storage: (Cassandra)--read_data--(Spark Job)--write--(S3 parquet)

Cold storage restore: (S3 parquet)--read_data--(Spark Job)--write--(Cassandra)

Cassandra delete data job should be developed separately because it's also not trivial task. After data delete you should execute on each node nodetool cleanup.

Upvotes: 0

Related Questions