stantonk
stantonk

Reputation: 2010

Implications of SSTable immutability in Cassandra for disk usage

According to:

http://www.datastax.com/docs/1.0/ddl/column_family#about-column-family-compression

The reason RDBMSs see a performance degredation as a result of compression is because the data being over-written must be seeked on disk, decompressed, over-written, and then recompressed. On the other hand, Cassandra can see performance increase for reads and writes because the SSTable is immutable, so no records are ever over-written and the overhead is thus much smaller than for a compressed RDBMS.

I'm wondering, what are the implications of this over the long term, as a Cassandra data store continues to grow? It seems like the only consequence is an ever-growing need for more disk space, is this correct?

Upvotes: 2

Views: 1037

Answers (1)

psanford
psanford

Reputation: 5670

Periodically Cassandra will run a compaction process on your existing SSTables. Compaction merges multiple SSTables into one new larger SSTable, discarding obsoleted data. After compaction has occurred Cassandra will (eventually) delete the old SSTables.

So if the size of your data set is stable your SSTable size will not grow infinitely. The Cassandra wiki contains more information on compaction.

Upvotes: 5

Related Questions