We are encountering the following error in our Cassandra 4.1.7 cluster after updating the replication factor from 1 to 3 in a 5-node cluster: below is the error: com.datastax.oss.driver.api.core.servererrors.CASWriteUnknownException: CAS operation result is unknown - proposal was not accepted by a quorum. (1 / 2) Key Information: Cassandra Version : 4.1.7 Cluster Size : 5 nodes Replication Strategy : NetworkTopologyStrategy Replication Factor : Initially 1, changed to 3 for the test keyspace. Consistency Level : LOCAL_QUORUM for both read and write operations. Observations: The issue persists after the replication factor was changed to 3. All nodes show as up with nodetool status , and there are no node failures. Ran nodetool repair on all nodes. The error occurs even with minimal load (just a single request). Request: We are seeking any insights into why this CASWriteUnknownException error is occurring after increasing the replication factor. Specifically, we are curious whether the issue is related to quorum consistency or if there are other configuration problems in the cluster.

Reputation: 11

CASWriteUnknownException in Cassandra 4.1.7 After Changing Replication Factor to 3

We are encountering the following error in our Cassandra 4.1.7 cluster after updating the replication factor from 1 to 3 in a 5-node cluster:

below is the error:

com.datastax.oss.driver.api.core.servererrors.CASWriteUnknownException: CAS operation result is unknown - proposal was not accepted by a quorum. (1 / 2)

Key Information:

Cassandra Version: 4.1.7
Cluster Size: 5 nodes
Replication Strategy: NetworkTopologyStrategy
Replication Factor: Initially 1, changed to 3 for the test keyspace.
Consistency Level: LOCAL_QUORUM for both read and write operations.

Observations:

The issue persists after the replication factor was changed to 3.
All nodes show as up with nodetool status, and there are no node failures.
Ran nodetool repair on all nodes.
The error occurs even with minimal load (just a single request).

Request:

We are seeking any insights into why this CASWriteUnknownException error is occurring after increasing the replication factor. Specifically, we are curious whether the issue is related to quorum consistency or if there are other configuration problems in the cluster.

Upvotes: 1

Answers (2)

Erick Ramirez

Reputation: 16353

Off the top of my head, the only scenario I could think of where a compare-and-set (CAS) proposal would result in a CASWriteUnknownException is when the client doesn't know (a) if the proposal reached the accepting nodes, or (b) if the proposal was accepted.

In this scenario, the status is unknown because the client didn't get a response from the nodes about the proposal which can happen as a result of a network interruption or partition.

It would have been handy to have (1) the full error message (not just the exception) plus (2) the full exception stack trace since they might provide clues on the cause of the failure. In the absence of that, I'm inclined to think there is an underlying network issue between the clients and the cluster nodes. Cheers!

Upvotes: 0

Aaron

Reputation: 57798

While it was stated that nodetool repair was run, the symptoms described seem to indicate that it was not. I'd recommend re-running a full-repair (not incremental) on all nodes.

I suppose that increasing the RF can increase the load on the cluster. So it may be possible that there is some compute resource contention. Perhaps have a look at metrics around (successful) read latencies, and see if increasing node resources will help.

But again, to me, this sounds like the nodetool repair was either not run, not successful, or run in "incremental" mode.

Edit 20241210

After doing some additional digging, I'm going to go with resource contention as the prime suspect here. This exception is thrown during the proposal phase of a LWT write operation, so my thought is that the remaining two replicas are too overloaded to "accept" the transaction proposal before it just times-out.

Upvotes: 2

CASWriteUnknownException in Cassandra 4.1.7 After Changing Replication Factor to 3

Key Information:

Observations:

Request:

Answers (2)

Related Questions