Reputation: 4600
I was studying up on Cassandra and i understand that it is a peer database where there are no master or slaves.
Each read/write is facilitated by a coordinator node, who then forwards the read/write request to the specific node by using the replication strategy and Snitch.
My question is around the performance problems with this method.
Upvotes: 1
Views: 485
Reputation: 16576
1) There will occasionally be an extra hop but your driver will most likely have a TokenAware Strategy for selecting the coordinator which will choose the coordinator to be a replica for the given partition.
2) The write is buffered and depending on your consistency level you will not receive acknowledgment of the write until it has been accepted on multiple nodes. For example with Consistency Level one you will receive an ACK as soon as the write as been accepted by a single node. The other nodes will have writes queued up and delivered but you will not receive any info about them. In the case that one of those writes fails/cannot be delivered, a hint will be stored on the coordinator to be delivered when the replica comes back online. Obviously there is a limit to the number of hints that can be saved so after long downtimes you should run repair.
With higher consistency levels the client will not receive an acknowledgment until the number of nodes in the CL have accepted the write.
3) The performance should scale with the total number of writes. If a cluster can sustain a net 10k writes per second but has RF = 2. You most likely can only do 5k writes per second since every write is actually 2. This will happen irregardless of your consistency level since those writes are sent even though you aren't waiting for their acknowledgment.
4) There is really no way to get around the coordination. The token aware strategy will pick a good coordinator which is basically the best you can do. If you manually attempted to write to each replica, your write would still be replicated by each node which received the request so instead of one coordination event you would get N. This is also most likely a bad idea since I would assume you have a better network between your C* nodes than from your client to the c* nodes.
Upvotes: 2
Reputation: 6495
To add to Andrew's response, don't assume the coordinator hop is going to cause significant latency. Do your queries and measure. Think about consistency levels more than the extra hop. Tune your consistency for higher read or higher write speed, or a balance of the two. Then MEASURE. If you find latencies to be unnacceptable, you may then need to tweak your consistency levels and / or change your data model.
Upvotes: 0
Reputation: 854
I don't have answers for 2 and 3, but as for 1 and 4.
1) Yes, this can cause an extra hop
4) Yes, well kind of. The Datastax driver, as well as the Netflix Astynax driver can be set to be Token Aware which means it will listen to the ring's gossip to know which nodes have which token ranges and send the insert to the coordinator on the node it will be stored on. Eliminating the additional network hop.
Upvotes: 0