Reputation: 115
I have multiple tables that I want to keep their deleted data.
I thought of two options to achieve that:
Create new table called deleted_x and when deleting from x, immediatly insert to deleted_x.
Advantage : querying from only one table.
Disadvantages :
Have a column called is_deleted
and put it in the partition key in each of these tables and set it to true when deleting a row.
Advantage : One table structure
Disadvantage : mention is_deleted
in all queries from table
Are there any performence considerations I should think of additionally? Which way is the better way?
Upvotes: 6
Views: 1965
Reputation: 5249
Are there any performence considerations I should think of additionally?
You will effectively double the write load and storage size for your cluster by inserting your data twice. This includes compactions, repairs, bootstrapping new nodes and backups.
Which way is the better way?
Let me suggest a 3rd option instead.
all_data
that contains each row and will never be deleted fromactive_data
using the same partition key. This table will only contain non-deleted rows (Edit: but not any data at all, just the key!).active_data
before reading from all_data
will allow you to only read non-deleted rowsUpvotes: 0
Reputation: 4426
Option #1 is awkward, but it's probably the right way to do things in Cassandra. You could issue the two mutations (one DELETE, and one INSERT) in a single batch, and guarantee that both are written.
Option #2 isn't really as easy as you may expect if you're coming from a relational background, because adding an is_deleted
column to a table in Cassandra and expecting to be able to query against it isn't trivial. The primary reason is that Cassandra performs significantly better when querying against the primary key (partition key(s) + optional clustering key(s) than secondary indexes. Therefore, for maximum performance, you'd need to model this as a clustering key - doing so then prohibits you from simply issuing an update - you'd need to delete + insert, anyway.
Option #2 becomes somewhat more viable in 3.0+ with Materialized Views - if you're looking at Cassandra 3.0+, it may be worth considering.
Upvotes: 3