RADU
RADU

Reputation: 108

Cassandra start complaing about write consistency with all nodes up and normal (UN)

We have two separates cassandra cluster that have nothing to do with each other and are miles away. Both are running Cassandra 4.1 and by about the same time (two weeks of separation) both stoped writing data with this error:

Server timeout during batchlog write at consistency TWO (1 peer(s) acknowledged the write over 2 required)

The 3 nodes in both cluster shows Up/Normal, so we have no clue about was going wrong. Luckily, we saw that if we stopped one of non seeds nodes, the writes resumes and everything flowed normally again, but if we started the node, the error appears again and the writes stops.

In order to have all nodes up and normal, we have to restart all machines, after that, we get back both cluster running normal again.

Have anyone see this behavior? we have a 3.11 cluster running by about 4 years with minor issues, this two clusters have about 4 months boths.

EDIT Adding describecluster

Cluster Information:
        Name: ConsMdm
        Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                3a47c6e5-4d0b-3386-b361-515d4522bc19: [192.168.30.5, 192.168.30.6, 192.168.30.7]

Stats for all nodes:
        Live: 3
        Joining: 0
        Moving: 0
        Leaving: 0
        Unreachable: 0

Data Centers:
        dc1 #Nodes: 3 #Down: 0

Database versions:
        4.1.5: [192.168.30.5:7000, 192.168.30.6:7000, 192.168.30.7:7000]

Keyspaces:
        system_auth -> Replication class: NetworkTopologyStrategy {dc1=3}
        mdm -> Replication class: NetworkTopologyStrategy {dc1=3}
        system_distributed -> Replication class: SimpleStrategy {replication_factor=3}
        system_traces -> Replication class: SimpleStrategy {replication_factor=2}
        system_schema -> Replication class: LocalStrategy {}
        system -> Replication class: LocalStrategy {}

Upvotes: 1

Views: 35

Answers (1)

AmirModiri
AmirModiri

Reputation: 775

after bringing back the node probably by running

nodetool describecluster

you can sae a schema mismatch. you should drop or reset local schema with

nodetool resetlocalschema

after full sync your problem will solve.

Upvotes: 0

Related Questions