Bob
Bob

Reputation: 1459

Cassandra NoHostAvailable and Failed to open native connection to Cassandra

I'm trying to get cassandra setup, and having some issues where google and other questions here are not helpful.

From cqlsh, I get NoHostAvailable: when I try to query tables after creating them:

Connected to DS Cluster at 10.101.49.129:9042.
[cqlsh 5.0.1 | Cassandra 3.0.9 | CQL spec 3.4.0 | Native protocol v4]
Use HELP for help.
cqlsh> use test;
cqlsh:test> describe kv;

CREATE TABLE test.kv (
    key text PRIMARY KEY,
    value int
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

cqlsh:test> select * from kv;
NoHostAvailable:

All of the nodes are up and running according to nodetool.

When I try to connect from Spark, I get something similar -- everything works fine I can manipulate and connect to tables, until I try to access any data, and then it fails.

val df = sql.read.format("org.apache.spark.sql.cassandra").options(Map("keyspace" -> "test2", "table" -> "words")).load
df: org.apache.spark.sql.DataFrame = [word: string, count: int]
df.show
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 25, HOSTNAME): java.io.IOException: Failed to open native connection to Cassandra at {10.101.49.129, 10.101.50.24, 10.101.61.251, 10.101.49.141, 10.101.60.94, 10.101.63.27, 10.101.49.5}:9042
    at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:148)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:148)
    at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
    at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
    at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
    at com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:325)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
...
Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.Futures.withFallback(Lcom/google/common/util/concurrent/ListenableFuture;Lcom/google/common/util/concurrent/FutureFallback;Ljava/util/concurrent/Executor;)Lcom/google/common/util/concurrent/ListenableFuture;
    at com.datastax.driver.core.Connection.initAsync(Connection.java:177)
    at com.datastax.driver.core.Connection$Factory.open(Connection.java:731)
    at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:251)

I apologize if this is a naive question, and thank you in advance.

Upvotes: 0

Views: 1308

Answers (2)

RussS
RussS

Reputation: 16576

NoHostAvailable

Replication in Cassandra is done via one of two strategies which can be specified on a particular keyspace.

SimpleStrategy :

Represents a naive approach and spreads data globally among nodes based on the token ranges owned by each node. There is no differentiation between nodes that are in different datacenters.

There is one parameter for SimpleStrategy which chooses how many replicas for any partition will exist within the entire cluster.

NetworkTopologyStrategy :

Represents a per Datacenter replication strategy. With this strategy data is replicated based on the token ranges owned by nodes but only within a datacenter.

This means that if you have a two datacenters with nodes [Token] and a full range of [0-20]

 Datacenter A : [1], [11]
 Datacenter B : [2], [12]

Then with simple strategy the range would be viewed as being split like this

[1] [2-10] [11] [12 -20]

Which means we would end up with two very unbalanced nodes which only own a single token.

If instead we use NetworkTopologyStrategy the responsibilities look like

Datacenter A : [1-10], [11-20]
Datacenter B : [2-11], [12-01]

The strategy itself can be described with a dictionary as a parameter which lists each datacenter and how many replicas should exist in that datacenter.

For example you can set the replication as

'A' : '1'
'B' : '2'

Which would create 3 replicas for the data, 2 replicas in B but only 1 in A.

This is where a lot of users run into trouble since you could specify

a_mispelled : '4'

Which would mean that a datacenter which doesn't exist should have replicas for that particular keyspace. Cassandra would then respond whenever doing requests to that keyspace that it could not obtain replicas because it can't find the datacenter.

With VNodes you can get skewed replication (if required) by giving different nodes different numbers of VNodes. Without VNodes it just requires shrinking the ranges covered by nodes which have less capacity.

How data gets read

Regardless of the replication, data can be read from any node because the mapping is completely deterministic. Given a keyspace, table and partition key, Cassandra can determine on which nodes any particular token should exist and obtain that information as long as the Consistency Level for the query can be met.

Guava Error

The error you are seeing most commonly comes from a bad package of Spark Cassandra Connector being used. There is a difficulty with working with the Java Cassandra Driver and Hadoop since both require different (incompatible) versions of Guava. To get around this the SCC provides builds with the SCC guava version shaded but re-including the Java Driver as a dependency or using an old build can break things.

Upvotes: 2

questionaire
questionaire

Reputation: 2585

for me it looks like two issues:

1st for cqlsh you seem to have missconfigured the replication factor of your keyspace. What's the RF you've used there?

See also the datastax documentation.

For the spark issue it seems like that some of the google guava dependency isn't compatible with your driver?

In the latest guava release there was an API change. See

java.lang.NoClassDefFoundError: com/google/common/util/concurrent/FutureFallback

Upvotes: 1

Related Questions