mikestaszel
mikestaszel

Reputation: 362

Freeing disk space of overwritten data?

I have a table whose rows get overwritten frequently using the regular INSERT statements. This table holds ~50GB data, and the majority of it is overwritten daily.

However, according to OpsCenter, disk usage keeps going up and is not freed.

I have validated that rows are being overwritten and not simply being appended to the table. But they're apparently still taking up space on disk.

How can I free disk space?

Upvotes: 1

Views: 64

Answers (1)

bechbd
bechbd

Reputation: 6341

Under the covers the way Cassandra during these writes is that a new row is being appended to the SSTable with a newer time stamp. When you perform a read the newest row (based on time stamp) is being returned to you as the row. However this also means that you are using twice the disk space to accomplish this. It is not until Cassandra runs a compaction operation that the older rows will be removed and the disk space recovered. Here is some information on how Cassandra writes to disk which explains the process:

http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_write_path_c.html?scroll=concept_ds_wt3_32w_zj__dml-compaction

A compaction is done on a node by node basis and is a very disk intensive operation which may effect the performance of your cluster during the time it is running. You can run a manual compaction using the nodetool compact command:

https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCompact.html

As Aaron mentioned in his comment above overwriting all the data in your cluster daily is not really the best use case for Cassandra because of issues such as this one.

Upvotes: 3

Related Questions