Reputation: 103843
We are using cassandra 2.0.17 and we have a table with 50% selects, 40% of updates and 10% of inserts (no deletes).
To have high read performance for such table we found that it is suggested to use LeveledCompactionStrategy (it is supposed to guarantee that 99% of reads will be fulfilled from single SSTable). Every day when I run nodetool cfhistograms
i see more and more SSTtables per read. First day we had 1, than we had 1,2,3 ...
and this morning I am seeing this:
ubuntu@ip:~$ nodetool cfhistograms prodb groups | head -n 20
prodb/groups histograms
SSTables per Read
1 sstables: 27007
2 sstables: 97694
3 sstables: 95239
4 sstables: 3928
5 sstables: 14
6 sstables: 0
7 sstables: 19
The describe groups returns this:
CREATE TABLE groups (
...
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=172800 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
Is it normal? In such case we loose the advantage of using LeveledCompaction which as described in the documentation should guarantee 99% of reads from single sstable.
Upvotes: 7
Views: 1738
Reputation: 461
It does depend on the usecase - but as a rule of thumb I normally look at LCS for 90% read to 10% write ratio. From your description you're looking at 50/50 at best.
The additional compaction demands placed by LCS makes it pretty io hungry. It's highly likely that compaction is backed up and your levels are not balanced. The easiest way to tell is to run nodetool cfstats for the table in question.
You're looking for the line:
SSTables in each level: [2042/4, 10, 119/100, 232, 0, 0, 0, 0, 0]
The numbers in the square brackets shows how many sstables are in each level. [L0, L1, L2 ...]. The number after the slash is the ideal level. As a rule of thumb L1 should be 10, L2 100, L3 1000 etc.
New sstables go in at L0 and then gradually move up. You can see the above example is in a really bad state. We've still got 2000 sstables to process more than exists in all other levels. The performance here will be massively worse than if I'd just used STCS.
Nodetool cfstats makes it pretty easy to measure if LCS is keeping up with your usecase. Just dump out the above every 15 minutes throughout the day. Any time your levels are unbalanced the read performance will suffer. If it's constantly behind you probably want to switch to STCS. If it spikes for say 10 minutes when you data load but the rest of the day is fine - then you may decide to live with it. If it never goes out of balance - stick with LCS - it's totally working for you.
As a side note - 2.1 allows L0 to carry out STCS style merging which will help in the situation where you have a temporary spike. If you're in the ten minute scenario above - it's almost certainly worth an upgrade.
Upvotes: 21