Igor Zubchenok
Igor Zubchenok

Reputation: 715

Cassandra row with few columns read performance degradation

I have a Cassandra v1.2.5 performance degradation on reading data from a single row where only few or zero columns, but previously many different columns were added and deleted.

To test I do the following:

So after that reading became in ~70 times slower than before I added and removed 500000 columns.

Tries to compact, flush, repair - nothing helps. Speed was a bit increased up-to 208.7 ms

The only thing that helps to restore read performance is to remove the row completely. Writing and reading to other rows are still fast.

Why does this read speed degradation happen? And how to fix?

Upvotes: 1

Views: 614

Answers (1)

Richard
Richard

Reputation: 11100

The degradation is because of tombstones. Cassandra can't just delete the columns, because if a replica didn't receive the delete, the columns would reappear when that node came back online. For this reason, Cassandra stores deletes as tombstones, which are just like values but with a marker saying the column is deleted.

The tombstones are deleted after gc_grace_seconds. By this time, it is assumed all replicas will have seen the delete so the tombstones can safely be removed. The default is 10 days. You can control it (per column family) - if in your use case you delete at consistency level ALL, or columns coming back to life doesn't matter too much, you could even lower it to 0.

Alternatively, if you want to delete a whole row, you can do a row delete rather than deleting individual columns. This inserts a row tombstone which, after compaction, means reading the row should be about as quick as if you had never inserted the now deleted columns.

Upvotes: 2

Related Questions