Dalpapa
Dalpapa

Reputation: 177

Cassandra Cluster in AWS Multi Region VPC

I am trying to achieve the following schema for my Cassandra Cluster :

So far, I've been able to achieve the configuration of the cluster, installing OpsCenter and checking that every agent is working fine. (For reference, I used GossipPropertyFileSnitch and have put "dc=us-west, rack=1b" in the rack config.

My problem is that my HTTP API is slow and is getting Timeout way too much. I've been trying to run some import scripts (over HTTP that inserts in Cassandra via CQL Driver) and keep getting this type of error :

Error while executing batch:com.google.common.util.concurrent.UncheckedExecutionException: java.lang.Runtim eException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.

For reference, the corresponding error in system.log is :

ERROR [SharedPool-Worker-1] 2015-03-04 19:25:39,598 ErrorMessage.java:243 - Unexpected exception during request
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache.get(LocalCache.java:3934) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) ~[guava-16.0.jar:na]
at org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:56) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.Auth.getPermissions(Auth.java:78) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.ClientState.authorize(ClientState.java:352) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:250) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:244) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:228) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:128) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.statements.BatchStatement.checkAccess(BatchStatement.java:86) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.QueryProcessor.processBatch(QueryProcessor.java:500) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.transport.messages.BatchMessage.execute(BatchMessage.java:215) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.3.jar:2.1.3]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [na:1.8.0_31]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.3.jar:2.1.3]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_31]
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.auth.Auth.selectUser(Auth.java:279) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:100) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:67) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.PermissionsCache$1.load(PermissionsCache.java:82) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.PermissionsCache$1.load(PermissionsCache.java:79) ~[apache-cassandra-2.1.3.jar:2.1.3]
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2317) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2280) ~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2195) ~[guava-16.0.jar:na]
... 23 common frames omitted
Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:103) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:139) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1338) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1265) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1188) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:253) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:206) ~[apache-cassandra-2.1.3.jar:2.1.3]
at org.apache.cassandra.auth.Auth.selectUser(Auth.java:268) ~[apache-cassandra-2.1.3.jar:2.1.3]
... 32 common frames omitted

It did work sometime, and I was even able to connect to DevCenter and actually see my data. But it's failing too much.

My temporary solution was to enable communication on the public IP of each instance and still have them work on private IP together. I'm doing my import at the moment.

Now I'm still wondering :

Thank you for your help.

Upvotes: 2

Views: 1782

Answers (1)

Roman Tumaykin
Roman Tumaykin

Reputation: 1931

I personally don't think that this solution is viable. There are several reasons for this.

  1. There is going to be a huge latency between the regions. Just imagine that all of the data that you may want to store in the cluster you will need to replicate across the internet, doing either a VPN or SSL encryption/decryption, depending on the method you choose. And I assume that you selected Cassandra because you are planning to have a lot of data.
  2. You will pay through the nose since the gossip protocol is really chatty and all of your data will pass through endpoints many times back and forth. And you will pay $0.02 per GB for each GB that is sent from one node to another.
  3. You will continue having timeouts unless you increase all relevant timeout values in cassandra.yaml, but then it will be just plain slow.

You can do SSL node to node, here is the detail.

I am not 100% sure about the timeout cause but there is a serious indication that it comes from the fact that the node did not receive responses from the other nodes within the timeout value:

Operation timed out - received only 0 responses.

I would recommend setting up a multi-datacenter cluster where you have one datacenter in the same region, with another datacenter in another region. This way your application talks to a set of local nodes, and then the data gets replicated to the remote datacenter nodes. Cassandra has ways to reduce the amount of traffic between multi-region datacenters.

Here is a great slide presentation about multi-region datacenters. It has also some useful info which I did not cover here.

Upvotes: 4

Related Questions