Devin
Devin

Reputation: 1086

Sporadic Cassandra WriteErrors when using Lightweight Transactions

I have a service that connects to our Cassandra cluster and executes tens of thousands of queries per day using Lightweight (ACID) Transactions to implement the Consensus system desribed here. For the most part it works fine, but sporadically, the writes will fail with the error saying "Operation timed out - received only 1 responses" (or less commonly, only 0 responses). We're using the Datastax Python driver. When the error occurs, the full error line (at the end of the stack trace) reads:

WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'required_responses': 2, 'consistency': 'LOCAL_SERIAL'}

Is this something that seems expected to occur from time to time in a production Cassandra setup? Or does it seem like something where we could have a configuration problem with our Cassandra cluster or network?

Some information about our Cassandra cluster: It is an 8-node setup spread across 2 Amazon EC2 regions (4 nodes per region). All of the nodes are running version 3.3.0 of the Datastax Cassandra distribution.

Upvotes: 1

Views: 1429

Answers (1)

Michal
Michal

Reputation: 2228

From https://issues.apache.org/jira/browse/CASSANDRA-9328

There is cases where under contention the coordinator loses track of whether the value it submitted to Paxos might be applied or not (see CASSANDRA-6013). At which point we can't do anything else that answering "sorry I don't know". And since a WriteTimeoutException already means "I don't know", we throw it in that case too, even though it's not a proper timeout per-se

Upvotes: 4

Related Questions