Reputation: 23
I have just inherited a system with 3 nodes, 2 in one datacenter with a replication factor of 2 and 1 one in a second datacenter with a replication factor of 1. The system was upgraded from Cassandra 3.9 to Cassandra 3.11.3. Since the upgrade any queries in cqlsh return the error
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'}
Can anyone suggest what might be causing this problem or where I should look to identify the problem?
edit: I retried my query with a consistency of one but still received the error
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
Upvotes: 1
Views: 681
Reputation: 57808
Getting too long for comments...
There are a few things that could be causing this.
1 - How large is the largest partition? I'd check that with the following:
bin/nodetool tablestats yourKeyspaceName.ablog | grep "partition maximum"
If this comes back with something in double-digit GB range, you're in trouble.
2 - Are there any tombstones? You can check that with a similar command:
bin/nodetool tablestats yourKeyspaceName.ablog | grep "tombstones"
If that comes back with numbers in 3 or 4 digits, that could be a problem.
3 - Downgrading to 3.11.2 . 3.11.2 and 3.11.3 use the same SSTable format. It's just a matter of switching-out the binaries. Download/untar 3.11.2, move the conf dir in from the 3.11.3 directory, and it should be fine.
I only suggest this, because you could be running into CASSANDRA-14672.
4 - LOCAL_QUORUM w/RF=2 As I mentioned in the comments, querying at LOCAL_QUORUM with a RF < 3 isn't any different from querying at ALL. Cassandra computes quorum (majority) as follows:
QUORUM = (RF / 2) + 1 = (2 / 2) + 1 = 2 (replicas need to respond)
Seriously, you're not gaining anything by doing this. It only makes sense to do when you have a RF of 3 or more:
QUORUM = (RF / 2) + 1 = (3 / 2) + 1 = 2 (replicas need to respond)
And actually, querying at QUORUM with RF=2 hurts you, as you cannot tolerate a single node being down.
Upvotes: 1