code=1200 error after upgrading Cassandra to 3.11.3

Question

I have just inherited a system with 3 nodes, 2 in one datacenter with a replication factor of 2 and 1 one in a second datacenter with a replication factor of 1. The system was upgraded from Cassandra 3.9 to Cassandra 3.11.3. Since the upgrade any queries in cqlsh return the error

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'}

Can anyone suggest what might be causing this problem or where I should look to identify the problem?

edit: I retried my query with a consistency of one but still received the error

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

Aaron · Accepted Answer

Getting too long for comments...

There are a few things that could be causing this.

1 - How large is the largest partition? I'd check that with the following:

bin/nodetool tablestats yourKeyspaceName.ablog | grep "partition maximum"

If this comes back with something in double-digit GB range, you're in trouble.

2 - Are there any tombstones? You can check that with a similar command:

bin/nodetool tablestats yourKeyspaceName.ablog | grep "tombstones"

If that comes back with numbers in 3 or 4 digits, that could be a problem.

3 - Downgrading to 3.11.2 . 3.11.2 and 3.11.3 use the same SSTable format. It's just a matter of switching-out the binaries. Download/untar 3.11.2, move the conf dir in from the 3.11.3 directory, and it should be fine.

I only suggest this, because you could be running into CASSANDRA-14672.

4 - LOCAL_QUORUM w/RF=2 As I mentioned in the comments, querying at LOCAL_QUORUM with a RF < 3 isn't any different from querying at ALL. Cassandra computes quorum (majority) as follows:

QUORUM = (RF / 2) + 1 = (2 / 2) + 1 = 2 (replicas need to respond)

Seriously, you're not gaining anything by doing this. It only makes sense to do when you have a RF of 3 or more:

QUORUM = (RF / 2) + 1 = (3 / 2) + 1 = 2 (replicas need to respond)

And actually, querying at QUORUM with RF=2 hurts you, as you cannot tolerate a single node being down.

code=1200 error after upgrading Cassandra to 3.11.3

Answers (1)

Related Questions