RAJSEKHAR
RAJSEKHAR

Reputation: 21

Cassandra Compacting wide rows large partitions

I have been searching some docs online to get good understanding of how to tackle large partitions in cassandra.

I followed a document on the below link: https://www.safaribooksonline.com/library/view/cassandra-high-performance/9781849515122/ch13s10.html. Regarding "LARGE ROWS WITH COMPACTION LIMITS", below is metioned:

"The default value for in_memory_compaction_limit_in_mb is 64. This value is set in conf/cassandra.yaml. For use cases that have fixed columns, the limit should never be exceeded. Setting this value can work as a sanity check to ensure that processes are not inadvertently writing to many columns to the same key. Keys with many columns can also be problematic when using the row cache because it requires the entire row to be stored in memory."

In the /conf/cassandra.yaml, I did find a configuration named "in_memory_compaction_limit_in_mb".

The Definition in the cassandra.yaml goes as below: In Cassandra 2.0: in_memory_compaction_limit_in_mb (Default: 64) Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. When this occurs, a message is logged specifying the row key. The recommended value is 5 to 10 percent of the available Java heap size.

In Cassandra 3.0: (No such entries found in cassandra.yaml) compaction_large_partition_warning_threshold_mb (Default: 100) Cassandra logs a warning when compacting partitions larger than the set value

I have searching lot on what exactly the setting in_memory_compaction_limit_in_mb does. It mentions some compaction is done in memory and some compaction is done on disk. As per my understanding goes, When Compaction process runs: SSTABLE is being read from disk---->(compared,tombstones removed,stale data removed) all happens in memory--->new sstable written to disk-->old table being removed This operations accounts to high Disc space requirements and Disk I/O(Bandwidth). Do help me with,if my understanding of compaction is wrong. Is there anything in compaction that happens in memory. In my environment the in_memory_compaction_limit_in_mb is set to 800. I need to understand the purpose and implications.

Thanks in advance

Upvotes: 2

Views: 751

Answers (1)

Chris Lohfink
Chris Lohfink

Reputation: 16400

in_memory_compaction_limit_in_mb is no longer necessary since the size doesn't need to be known before writing. There is no longer a 2 pass compaction so can be ignored. You don't have to do the entire partition at once, just a row at a time.

Now the primary cost is in deserializing the large index at the beginning of the partition that occurs in memory. You can increase the column_index_size_in_kb to reduce the size of that index (at cost of more IO during reads, but likely insignificant compared to the deserialization). Also if you use a newer version (3.11+) the index is lazy loaded after exceeding a certain size which improves things quite a bit.

Upvotes: 2

Related Questions