Reputation: 21
I'm using Cassandra 2.1.8 for managing various assets, mostly images. The cluster is setup with RF=3 on 3 nodes for the main CF, using Leveled Compaction. I deleted a large portion of the data from the cluster (using a CQL script with lots of deletes), but the space has not yet been reclaimed. Furthermore, one node experienced corruption on SSTables and I chose to wipe out its data and run repair to re-create it. Now, the repaired node now uses 250GB in 100 SSTables, while the other 2 use 750GB in 300 SSTables. "nodetool cfstats" shows 4.5M keys on one node, and 17M in the other two.
Is there a way to force a cleanup on these 2 nodes? Running "nodetool compact" did not seem to have much effect on the 2 nodes - and it finishes rather quick.
Upvotes: 2
Views: 4760
Reputation: 1949
I've had a similar issue with tombstones not beeing deleted. The solution was to set the unchecked_tombstone_compaction property on the table.
If I understand correctly this will allow the deletes even in cases were the table is not fully repaired
ALTER TABLE myTable WITH COMPACTION = { 'class': 'DateTieredCompactionStrategy','unchecked_tombstone_compaction': 'true' };
Upvotes: 0
Reputation: 1653
The reason why space is not reclaimed is because deletes in Cassandra are not "instant" - well at least from a storage perspective. It's not until GC_GRACE
has expired and compaction runs that data effectively gets removed.
Now everyone's first instinct is to go and set GC_GRACE=0
so that data goes away faster. That's not what you want to do. Here's why: https://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/
Your script that did a lot of deletes - did it do whole partition deletes? Or column deletes? Note, you may experience some overhead during compaction at that time depending on how many deletes you did and how you did your deletes.
Upvotes: 2