Reputation: 3
I'm testing single node Datastax Cassandra 2.0 with default configuration with a client written using Astyanax.
In my scenario there is one CF, each row contains key (natural number parsed to string) and one column, that keeps 1kB of random text data.
Client performs operations of inserting rows, until the data size reaches 50GB. It does this with speed of 3000 req/sec, which is enough for me. Next step is to read all of this data, with the same order as they were inserted. And here come problems. Lets see example log, produced by my program:
reads writes time req/sec
99998 0 922,59 108
100000 0 508,51 196
100000 0 294,85 339
100000 0 195,99 510
100000 0 137,11 729
100000 0 105,48 948
100000 0 105,83 944
100000 0 76,05 1314
100000 0 71,94 1389
100000 0 63,34 1578
100000 0 63,91 1564
100000 0 65,69 1522
100000 0 1217,52 82
100000 0 725,67 137
100000 0 502,03 199
100000 0 342,17 292
100000 0 336,83 296
100000 0 332,56 300
100000 0 330,27 302
100000 0 359,74 277
100000 0 320,01 312
100000 0 369,02 270
100000 0 774,47 129
100000 0 564,81 177
100000 0 729,50 137
100000 0 656,28 152
100000 0 611,29 163
100000 0 589,29 169
100000 0 693,99 144
100000 0 658,12 151
100000 0 294,53 339
100000 0 126,81 788
100000 0 206,13 485
100000 0 924,29 108
The throughput is unstable, and rather low.
I'm interested in any help, that may improve read time. I also can provide some more information.
Thanks for help!
Kuba
Upvotes: 0
Views: 1158
Reputation: 5670
I'm guessing you are doing your read sequentially. If you do them in parallel you should be able to do many more operations per second.
Update to address single read latency:
Read latency can be affected by the following variables:
There are a number of tools that can help you answer these questions, some
specific to Cassandra and others general system performance tools. Look in the
Cassandra logs for GC pauses and for dropped requests. Look at nodetool cfstats
to see latency stats. Use nodetool cfhistograms
to check latency distributions,
the number of sstables hit per read, and row size distribution. Use nodetool tpstats
to check for dropped requests and queue sizes.
You can also use tools like iostat
and vmstat
to see disk and system utilization stats.
Upvotes: 2