Reputation: 1568
I do know that it's a cassandra anti-pattern to delete rows (and more so – doing it frequently), but in my simple use case I have a local cassandra (single instance, replication factor set to 1) that I use for unit tests, which drop all tables before running, naturally to perform the tests with a clean slate.
Over time, the performance of this cassandra instance degraded extremely. It surprised me a bit that dropping the keyspaces althogether didn't help at all. Only by manually deleting everything in cassandra data directory I managed to recover all the performance.
This solution is quite fine for me as I don't care for the test data I delete over and over again, but it certainly feels a bit weird to have to delete these things manually on file system. Is there a better way to deal with such situation? Or am I going about this whole case completely wrong?
Upvotes: 1
Views: 276
Reputation: 657
Based on the little information provided, I will provide some info:
First, deleting data creates tombstones in cassandra. The default behavior is to keep these tombstones for 10 days, set by the variable gc_grace_seconds.
Given you only have 1 node and don't care about the data once you delete it, you could set gc_grace_seconds to zero. You also could make sure to run compaction after you do a lot of deletes.
Documentation here:
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tabProp.html
http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCompact.html
Lastly, there is a feature known as TTL, Time To Live. You could use that instead of deleting and let the database do the "deletes" once the data expires. If you go this route, I would still set gc_grace_seconds to zero and run compactions (via an hourly cronjob since its a dev environment).
Upvotes: 3