Reputation: 1336
I have a use case where a large number of rows in Cassandra are being frequently read and updated, where the write/read ratio is slightly above 1. Also, writes in most cases replace all the values in a row. I'm wondering how to optimize for such use case. Usually, leveled compaction is suggested, but since the whole row essentially gets reinserted, size tiered compaction seems a better way. Am I right? Are there some specific optimizations that can also be done in such case?
Upvotes: 1
Views: 152
Reputation: 13731
It depends what you're trying to optimize. Leveled compaction and Size-tiered compaction have different upsides and downsides in your use case, and which one is better for you may depend on the specifics of your use-case or hardware:
Leveled Compaction Strategy (LCS), which other people seem to be warmly recommending in their responses, has a benefit of wasting the least amount disk disk space - around 10% - to store old data which has already been overwritten. On the other hand, the biggest downside of LCS is that it uses a lot more disk I/O - rewriting the same data over and over to maintain the low space usage. Since your use case is heavy in writes (as many as half of the requests are write), this extra write I/O may become a big problem.
Size-Tiered Compaction Strategy (STCS) will need to do less I/O work per write, but at the same time waste more disk space: By default you can have as many as 4 versions (!) of each row stored in 4 different sstables before compaction kicks in and gets rid of the older copies. You can significantly reduce this waste by setting min_threshold=2
instead of the default 4
, but it will still not come close to the space-optimality of leveled compaction. Cassandra's Size-Tiered compaction implementation also has the problem that during compaction it needs both input and output files to exist at the same time - leading to the often-quoted need to always leave half of the disk space free (ScyllaDB has a solution to this last problem, but Apache Cassandra does not).
To summarize, with STCS you will need more disk space, while with LCS you will need more disk bandwidth. Which one is a worse problem for you depends on your hardware and how close you are to being bottlenecked by the disk's bandwidth, the amount of diskspace, or neither.
For more details about these issues, you can check out a blog post I wrote on Size-tiered compaction and space amplification problem, and another one on Leveled Compaction and its write-amplification problem.
Upvotes: 1