Reputation: 5
Using Cassandra simple toplogy:
One node (select count() 1,000,000 rows) is 18.524s
6 nodes (select count() 1,000,000 rows) is 30.000s
6 nodes setting is networktopology
and replication factor is 1
and consistency is 1
. I don't know why Cassandra can't improve performance.
Upvotes: 0
Views: 27
Reputation: 87259
Cassandra is distributed system, and it's performance scales up only when you use correct queries that target only specific node. In your example, count
requires that query was sent to all nodes, then results need to be collected on the coordinating node, and then returned to caller. Count in Cassandra should be used only inside single partition - if you need to count something across multiple partitions, you need to look into direction of Spark, etc.
I would recommend to take DS201 & DS220 courses on DataStax Academy - to get better understanding how Cassandra works, and how to model data for it.
Upvotes: 1