Reputation: 5671
I have built recently a Hadoop
- Cloudera cluster
and Cassandra
cluster with 2 nodes. I would like to make now some benchmarking, collect some data about resource usage.
I've searched a lot and found HiBench
and Cassandra
stress tool. I don't want to compare with other systems, I would like to mesure my own, but it is difficult to imagine, how can I get real and correct values. Cluster consists of 2 virtual machines, created with KVM
. Cassandra
is in Docker
containers. Hard to interpret, how to analyze this system, without getting false results.
Upvotes: 1
Views: 348
Reputation: 346
You can use yahoo cloud serving benchmark to benchmark your Cassandra cluster. Below is the link to it and the corresponding git repository.
https://research.yahoo.com/news/yahoo-cloud-serving-benchmark/ https://github.com/brianfrankcooper/YCSB
The benchmark is quite flexible and has a lot of parameters that can be changed to fully understand the cluster behavior and properties. However, one key drawback of this framework is that it uses random data by default. But, you can tweak the code to use it for your own data and then, it should probably suit your needs.
Upvotes: 1
Reputation: 8812
Some remarks
Cluster consists of 2 virtual machines, created with KVM
Don't use virtual machines if you want to benchmark performance. Indeed Cassandra performs sequential writes on disk to optimize scan operations. By using virtual machines and shared disks, the benefit of sequential writes is lost because the hypervisor can re-order and dispatch contiguous data on different disk sectors, thus destroying the previous optimization for sequential scans
One alternative is to ensure that you have a dedicated disk for each VM.
If you're not doing performance benchmark, ignore the above comment
Second advice, use real data set e.g. big data set that do not fit into memory so that you can see how each technology behaves. Read this for more details: http://www.nextplatform.com/2016/02/19/the-myth-of-in-memory-computing/
Upvotes: 1