Reputation: 993
I got a 3x nodes cluster (on the same 16 core box, in virtual box via lxc but each node on a 3TB disk on it's own).
My table is this:
CREATE TABLE history (
id text,
idx bigint,
data bigint,
PRIMARY KEY (id, idx)
) WITH CLUSTERING ORDER BY (idx DESC)
id will store an id which is a string , idx is a time in ms and data are my data. According to all examples I found, this seems to be a correct schema for time series data.
My query is :
select idx,data from history where id=? limit 2
This returns the 2 most recent (based on idx) rows.
Since id is the partition key and idx the clustering key, docs I found claim that this is very performant with cassandra. But my benchmarks say otherwise.
I've populated a 400GB in total (split in those 3 nodes) and now I am running queries from a 2ndary box. Using 16 or 32 threads, I am running the mentioned query but the performance is really low for 3 nodes running on 3 separate disks:
throughput: 61 avg time: 614,808 μs
throughput: 57 avg time: 519,651 μs
throughput: 52 avg time: 569,245 μs
So , ~55 queries per second, each query taking half second (sometimes they do take 200ms)
I find this really low.
Can someone please tell me if my schema is correct and if not suggest a schema? If my schema is correct, how can I find what is going wrong?
Disk IO on the 16core box:
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 135.00 6.76 0.00 6 0
sdc 149.00 6.99 0.00 6 0
sdd 124.00 7.21 0.00 7 0
The cassandras don't use more than 1 cpu core each.
EDIT: With tracing on I get a lot of lines like the following when I run a simple query for 1 id:
Key cache hit for sstable 33259 | 20:16:26,699 | 127.0.0.1 | 5830
Seeking to partition beginning in data file | 20:16:26,699 | 127.0.0.1 | 5833
Bloom filter allows skipping sstable 33256 | 20:16:26,699 | 127.0.0.1 | 5923
Bloom filter allows skipping sstable 33255 | 20:16:26,699 | 127.0.0.1 | 5932
Bloom filter allows skipping sstable 33252 | 20:16:26,699 | 127.0.0.1 | 5938
Key cache hit for sstable 33247 | 20:16:26,699 | 127.0.0.1 | 5948
Seeking to partition beginning in data file | 20:16:26,699 | 127.0.0.1 | 5951
Bloom filter allows skipping sstable 33246 | 20:16:26,699 | 127.0.0.1 | 6072
Bloom filter allows skipping sstable 33243 | 20:16:26,699 | 127.0.0.1 | 6081
Key cache hit for sstable 33242 | 20:16:26,699 | 127.0.0.1 | 6092
Seeking to partition beginning in data file | 20:16:26,699 | 127.0.0.1 | 6095
Bloom filter allows skipping sstable 33240 | 20:16:26,699 | 127.0.0.1 | 6187
Key cache hit for sstable 33237 | 20:16:26,699 | 127.0.0.1 | 6198
Seeking to partition beginning in data file | 20:16:26,699 | 127.0.0.1 | 6201
Key cache hit for sstable 33235 | 20:16:26,699 | 127.0.0.1 | 6297
Seeking to partition beginning in data file | 20:16:26,699 | 127.0.0.1 | 6301
Bloom filter allows skipping sstable 33234 | 20:16:26,699 | 127.0.0.1 | 6393
Key cache hit for sstable 33229 | 20:16:26,699 | 127.0.0.1 | 6404
Seeking to partition beginning in data file | 20:16:26,699 | 127.0.0.1 | 6408
Bloom filter allows skipping sstable 33228 | 20:16:26,699 | 127.0.0.1 | 6496
Key cache hit for sstable 33227 | 20:16:26,699 | 127.0.0.1 | 6508
Seeking to partition beginning in data file | 20:16:26,699 | 127.0.0.1 | 6511
Key cache hit for sstable 33226 | 20:16:26,699 | 127.0.0.1 | 6601
Seeking to partition beginning in data file | 20:16:26,699 | 127.0.0.1 | 6605
Key cache hit for sstable 33225 | 20:16:26,700 | 127.0.0.1 | 6692
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 6696
Key cache hit for sstable 33223 | 20:16:26,700 | 127.0.0.1 | 6785
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 6789
Key cache hit for sstable 33221 | 20:16:26,700 | 127.0.0.1 | 6876
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 6880
Bloom filter allows skipping sstable 33219 | 20:16:26,700 | 127.0.0.1 | 6967
Key cache hit for sstable 33377 | 20:16:26,700 | 127.0.0.1 | 6978
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 6981
Key cache hit for sstable 33208 | 20:16:26,700 | 127.0.0.1 | 7071
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 7075
Key cache hit for sstable 33205 | 20:16:26,700 | 127.0.0.1 | 7161
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 7166
Bloom filter allows skipping sstable 33201 | 20:16:26,700 | 127.0.0.1 | 7251
Bloom filter allows skipping sstable 33200 | 20:16:26,700 | 127.0.0.1 | 7260
Key cache hit for sstable 33195 | 20:16:26,700 | 127.0.0.1 | 7276
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 7279
Bloom filter allows skipping sstable 33191 | 20:16:26,700 | 127.0.0.1 | 7363
Key cache hit for sstable 33190 | 20:16:26,700 | 127.0.0.1 | 7374
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 7377
Bloom filter allows skipping sstable 33189 | 20:16:26,700 | 127.0.0.1 | 7463
Key cache hit for sstable 33186 | 20:16:26,700 | 127.0.0.1 | 7474
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 7477
Key cache hit for sstable 33183 | 20:16:26,700 | 127.0.0.1 | 7563
Seeking to partition beginning in data file | 20:16:26,700 | 127.0.0.1 | 7567
Bloom filter allows skipping sstable 33182 | 20:16:26,701 | 127.0.0.1 | 7663
Bloom filter allows skipping sstable 33180 | 20:16:26,701 | 127.0.0.1 | 7672
Bloom filter allows skipping sstable 33178 | 20:16:26,701 | 127.0.0.1 | 7679
Bloom filter allows skipping sstable 33177 | 20:16:26,701 | 127.0.0.1 | 7686
Maybe most important is the end of the trace:
Merging data from memtables and 277 sstables | 20:21:29,186 | 127.0.0.1 | 607001
Read 3 live and 0 tombstoned cells | 20:21:29,186 | 127.0.0.1 | 607205
Request complete | 20:21:29,186 | 127.0.0.1 | 607714
Upvotes: 1
Views: 464
Reputation: 1297
Do look at tracing to confirm, but if sdb,sdc, and sdd are spinning disks, you are seeing the correct order of magnitude of tps, and are very likely random disk I/O bound on the read-side.
If that is the case, then you only have two options (with any system, not specific to Cassandra):
Cassandra can do roughly 3k-5K operations (read or write) per CPU core, but only if the disk subsystem isn't the limiting factor.
Upvotes: 2