Lirm
Lirm

Reputation: 413

Cassandra DB slowly consuming the disk

I have a Cassandra database used to persist the last hour of a steady stream of messages. TTL on each row is set to 1 hour. Querying DB confirms that old records are gone, but disk utilization keeps going up. It sometimes drops a little (due to compaction, I assume), but overall trend over about a week is growing disk usage, until the disk is full, at which point it stops accepting data.

Killing the process and restarting cleans up a little, but it starts at about 60G disk utilization on about 8-9G of actual data.

Trying to run ./nodetool compact just hangs there.

Where is the disk consumption coming from?

Upvotes: 0

Views: 108

Answers (1)

RussS
RussS

Reputation: 16576

TTL doesn't mean that your data vanishes from the disk. What it actually does it creates a tombstone which indicates that the record was deleted. This tombstone has to stick around incase another node did not receive the order to delete or suffered a network partition. Tombstones will not be removed until GC_GRACE seconds has expired which is by default 10 days. This means your data is going to stick around until that expiration occurs. This delay occurs so that you will have time to perform a repair prior to the tombstones finally being removed keeping dead data from being resurrected from a replica.

http://wiki.apache.org/cassandra/DistributedDeletes

Upvotes: 2

Related Questions