stefantigro
stefantigro

Reputation: 452

Cassandra LOCAL_QUORUM is waiting for remote datacenter responses

We have a 2 datacenters ( One in EU and one in US ) cluster with 4 nodes each deployed in AWS. The nodes are separated in 3 racks ( Availability zones ) each. In the cluster we have a keyspace test with replication: NetworkTopologyStrategy, eu-west:3, us-east:3 In the keyspace we have a table called mytable that has only one row 'id' text

Now, we were doing some tests on the performance of the database. In CQLSH with a consistency level of LOCAL_QUORUM we were doing some inserts with TRACING ON and we noticed that the requests were not working as we expected them.

From the tracing data we found out that the coordinator node was hitting as expected 2 other local nodes and was also sending a request to one of the remote datacenter nodes. Now the problem here was that the coordinator was waiting not only for the local nodes ( who finished in no time ) but for the remote nodes too.

Now since our 2 datacenters are geographically far away from each other, our requests were taking a very long time to complete.

Notes: - This does not happen with DSE but our understanding was we don't need to pay crazy money for LOCAL_QUORUM to work as is expected

Upvotes: 1

Views: 292

Answers (2)

Carlos Monroy Nieblas
Carlos Monroy Nieblas

Reputation: 2283

For functional and performance tests it would be better to use the driver instead of CQLSH, as most of the time that will be the way that you are interacting with the database.

For this case, you may use a DC-aware policy like

Cluster cluster = Cluster.builder()
    .addContactPoint("127.0.0.1")
    .withLoadBalancingPolicy(
            DCAwareRoundRobinPolicy.builder()
                    .withLocalDc("myLocalDC")
                    .build()
    ).build();

This is modified from the example here, where all the clauses that allow to interact with remote datacenters are removed, as your purpose is to isolate the calls to local.

Upvotes: 1

Alex Ott
Alex Ott

Reputation: 87369

There is a high probability that you're hitting CASSANDRA-9753 when the non-zero dclocal_read_repair_chance will trigger a query against remote DC. You need to check the trace for hint about triggering of read repair for your query. If you really get it, then you can set dclocal_read_repair_chance to 0 - this parameter is deprecated anyway...

Upvotes: 1

Related Questions