Sporadic Cassandra WriteErrors when using Lightweight Transactions

Question

I have a service that connects to our Cassandra cluster and executes tens of thousands of queries per day using Lightweight (ACID) Transactions to implement the Consensus system desribed here. For the most part it works fine, but sporadically, the writes will fail with the error saying "Operation timed out - received only 1 responses" (or less commonly, only 0 responses). We're using the Datastax Python driver. When the error occurs, the full error line (at the end of the stack trace) reads:

WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'required_responses': 2, 'consistency': 'LOCAL_SERIAL'}

Is this something that seems expected to occur from time to time in a production Cassandra setup? Or does it seem like something where we could have a configuration problem with our Cassandra cluster or network?

Some information about our Cassandra cluster: It is an 8-node setup spread across 2 Amazon EC2 regions (4 nodes per region). All of the nodes are running version 3.3.0 of the Datastax Cassandra distribution.

Sporadic Cassandra WriteErrors when using Lightweight Transactions

Answers (1)

Related Questions