Fify
Fify

Reputation: 141

Cassandra lookup query is quite slow after deleting large bundle of data

Currently, I have a cassandra column family with large rows of data, to say more than 100,000. Now, I'd like to remove all data in this column family and the problem came up:

After all data is removed, I execute a lookup query in this column family, the cassandra will take tens of seconds to return a empty query result. And the time cost will increase Linearly when the original data is larger

It is caused by the tombstone feature while deleting data from the cassandra database. The lookup speed won't recover to normal until the next GC is fired. See Cassandra Distributed Deletes.

Because such query operations are frequently used in my system, I cannot bear the huge latency up to a few seconds.

Would you please give me a solution to this problem?

Upvotes: 5

Views: 3293

Answers (2)

Lyuben Todorov
Lyuben Todorov

Reputation: 14153

This sounds like a very bad way to use a database. Populate it, empty it, repeat. One way you can solve your problem is by using different CF names each time, as in when you empty the data and start repopulating it, create a new column family and use that and just drop the other colum family however this is hacky.

I'd suggest using compaction (gets rid of all the tombstones it can detect) to solve your problem, it is CPU intensive but it's better than waiting for tens of seconds for queries to respond. You can make the task less intensive on your machine by providing the specific ks & cf you want to compact:

./nodetool compact <ks_name> <cf_name>

Ritchard's point is a good one, gc_grace_seconds is set to 10 days by default so you will probably have to tweak this to allow for compaction to get rid of tombstones.

Upvotes: 3

doanduyhai
doanduyhai

Reputation: 8812

@Fify

If your column family is frequently modified (read then update then read the update again...), you should use the leveled compaction strategy

To make deleted columns removed quickier, change the property gc_grace_seconds of your column family

Upvotes: 0

Related Questions