What is the suggested procedure for dropping a keyspace with 55TB of data on 66-node cluster?

Question

There is a keyspace occupying 55tb of space, cluster has 66 nodes in it (dse 5.1.6, cassandra 3.11) The keyspace has 3 tables in it and there are no reads/writes on the tables since last one month.

Want to drop the keyspace/tables to reclaim space without causing any issues in the cluster?

When dropping tables and keyspace on this cluster - what might cause issues? the size of the unused keyspace (55tb) or the number of nodes (66) in the cluster to which the schema change (drop tables/keyspace) would would need to be propagated?
Other than dropping tables and then keyspace is there any other way to safely drop the keyspace? For example would dropping the sstables from nodes make drops quick and smoother? Would dropping sstables trigger repairs/compactions and cause any issues?
Is there any way to disable auto_snapshot at session level or from a driver level for specific tables or keyspace?
Any considerations before/after dropping the tables/keyspace? here are the steps I am going to follow - a. nodetool describecluster b. drop tables using cqlsh (with request-timeout=600) c. drop keyspace using cqlsh (with request-timeout=600) d. nodetool describecluster, check for any inconsistencies e. From each node delete the data directory for the keyspace (data is already backed up somewhere, there is no need of autosnapshot)

Erick Ramirez · Accepted Answer

The only real issue I can foresee you would run into is a schema disagreement. Due to (a) size of the data and (b) number of nodes, my approach would be:

Attempt to TRUNCATE 1 table at a time
If the TRUNCATE times out, attempt a second time
When a table is truncated, DROP it
Wait at least 1 minute for schema to propagate
Check for schema disagreement and fix as appropriate
Repeat the steps above until all tables are dropped
DROP the keyspace
Manually delete snapshots from filesystem as necessary

To answer your questions directly:

You can run into timeouts and schema disagreement. And yes, it's a combination of (a) data size and (b) number of nodes.
I'd recommend truncating the tables first as above. This wouldn't result in a schema disagreement since it doesn't change the schema. By truncating first, it would allow the DROP to work without issues.
No, you can only disable auto_snapshot in cassandra.yaml which requires a rolling restart. You don't want to do that because it's not necessary to restart all 66 nodes.
I've posted the procedure above.

What is the suggested procedure for dropping a keyspace with 55TB of data on 66-node cluster?

Answers (1)

Related Questions