Reputation: 43
I recently used 9 cassandra VMs on openstack to test our product. Each VM has 16vCPUs, 50GB SSD and 20GB memory, but I find each node can only bear 10000+ opers/second with 70% CPUu, that is 90000 opers for 9 nodes.
The data model is simple read/write mixed scenario on normal tables, I haven't seen any obvious performance bottleneck during the test. From internet I can see some guys can achieve 4000 opers/s on AWS T2 medium nodes(only 2 vCPU) and some cassandra training materials say they can achieve 6000-12000 transactions per second.
Can anyone share your benchmark results on apache cassandra?
Upvotes: 1
Views: 769
Reputation: 57748
First of all, Alex is right. Schema (specifically primary key definitions) matters. The rest of this answer assumes you have built that anti-pattern free.
So the standard deployment image which I use for OpenStack is 16GB RAM w/ 8 CPUs (@ 2.6GHz). That's a smaller amount of RAM than I'd recommend for most production deploys, unless you have some extra time to engineer for efficiency. And yes, there are some clusters where that just wasn't enough and we had to build with more RAM. But this has largely been our standard for about 4 years.
The approach of many small nodes has worked well. There are clusters I've built which have sustained 250k ops/sec.
with 70% CPU
TBH, I've found that CPU with Cassandra doesn't matter as much as it does with other databases. When it gets high, it's usually an indicator of another problem.
I haven't seen any obvious performance bottle-net during the test.
On shared resource environments (like OpenStack) noisy neighbors are one of the biggest problems. Our storage team has imposed IOPs limits on provisioned disks, in an attempt to keep heavy loads from affecting others. Therefore, our top performing clusters required specially-configured volumes to allow levels of IOPs higher than what would normally be allowed.
Cassandra's metrics can tell you if your disk latency is high. If you see that your disk (read or write) latency is in double-digit milliseconds, then your disk is likely rate-limiting you.
Another thing to look at, is your table's histograms (with nodetool
). That can give you all sorts of good info, specifically around things like latency and partition sizes.
bin/nodetool tablehistograms stackoverflow.stockquotes
stackoverflow/stockquotes histograms
Percentile Read Latency Write Latency SSTables Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 0.00 0.00 124 5
75% 0.00 0.00 0.00 124 5
95% 0.00 0.00 0.00 124 5
98% 0.00 0.00 0.00 124 5
99% 0.00 0.00 0.00 124 5
Min 0.00 0.00 0.00 104 5
Max 0.00 0.00 0.00 124 5
If you look at the size of your usual partition, you can get an idea of how to optimize the table's chunk size. This value represents the size of the building blocks the table uses to interact with the disk.
AND compression = {'chunk_length_in_kb': '64',
'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
For instance, in the case above, I could save a lot on disk payloads just by setting my chunk_length_in_kb
down to 1
(the minimum), since my partitions are all less than 1024 bytes.
In any case, give your disk stats a look and see if there are some "wins" to be had there.
Upvotes: 1