Reputation: 75
There is a mem table heap size config in cassandra yaml file..lets say it's 2gb...now if clean up threshold is 33%..then after 675 mb of mem table space is occupied..cassandra will flush the largest mem table to disk..My question is what cassandra does with the remaining mem table space that is 1373 mb(2048-675).
According to my understanding at any point of time data in memtable space will not be more than 675 Mb,the moment mem table data grows beyond 675 mb,largest memtable get flushed to disk and data size in mem table space again becomes less than 675 mb...and this process goes on...then why we need to assign 2GB of mem table space...what is the reason behind it...does mem table dpace contains any thing other than mem table ...
Upvotes: 1
Views: 893
Reputation: 16400
Flushing is not instantaneous and it does not stop other writes from coming in. It essentially creates a new "active" memtable for the writes and puts the previous one on a queue to get flushed to disk (it can still be used for reads until flushed). So the space used on heap can most definitely exceed your threshold * space
.
This behavior is different on older versions of Cassandra where it would actually block the writes until the flush completes (tpstats showed this as blocked under the FlushWriter, which is no longer possible).
Since the size of the memtables can continue to grow while flushing occurs, there is a cut off limit (the memtable_heap_space_in_mb
setting) where it would actually stop writes to prevent it spinning out of control and causing OutOfMemory exceptions. This setting is more a limit that it can be grown to than a chunk of memory allocated immediately and reserved for the memtables.
Also note the memtable_cleanup_threshold
is deprecated:
The default calculation is the only reasonable choice.
Upvotes: 3