Alan Miranda
Alan Miranda

Reputation: 183

Error to write dataframe in Cassandra table on Amazon Keyspaces

I'm trying to write a dataframe on AWS (Keyspace), but I'm getting the following messages below:

Stack:

dfExploded.write.cassandraFormat(table = "table", keyspace = "hub").mode(SaveMode.Append).save()
21/08/18 21:45:18 WARN DefaultTokenFactoryRegistry: [s0] Unsupported partitioner 'com.amazonaws.cassandra.DefaultPartitioner', token map will be empty.
java.lang.AssertionError: assertion failed: There are no contact points in the given set of hosts
  at scala.Predef$.assert(Predef.scala:223)
  at com.datastax.spark.connector.cql.LocalNodeFirstLoadBalancingPolicy$.determineDataCenter(LocalNodeFirstLoadBalancingPolicy.scala:195)
  at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$dataCenterNodes$1(CassandraConnector.scala:192)
  at scala.Option.getOrElse(Option.scala:189)
  at com.datastax.spark.connector.cql.CassandraConnector$.dataCenterNodes(CassandraConnector.scala:192)
  at com.datastax.spark.connector.cql.CassandraConnector$.alternativeConnectionConfigs(CassandraConnector.scala:207)
  at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$sessionCache$3(CassandraConnector.scala:169)
  at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:34)
  at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
  at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
  at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:89)
  at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
  at com.datastax.spark.connector.datasource.CassandraCatalog$.com$datastax$spark$connector$datasource$CassandraCatalog$$getMetadata(CassandraCatalog.scala:455)
  at com.datastax.spark.connector.datasource.CassandraCatalog$.getTableMetaData(CassandraCatalog.scala:421)
  at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:68)
  at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
  at org.apache.spark.sql.DataFrameWriter.getTable$1(DataFrameWriter.scala:339)
  at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)

SparkSubmit:

spark-submit --deploy-mode cluster --master yarn  \
--conf=spark.cassandra.connection.port="9142" \
--conf=spark.cassandra.connection.host="cassandra.sa-east-1.amazonaws.com" \
--conf=spark.cassandra.auth.username="BUU" \
--conf=spark.cassandra.auth.password="123456789" \
--conf=spark.cassandra.connection.ssl.enabled="true" \
--conf=spark.cassandra.connection.ssl.trustStore.path="cassandra_truststore.jks"
--conf=spark.cassandra.connection.ssl.trustStore.password="123456"

Connection by cqlsh everything ok, but in spark got this error

Upvotes: 2

Views: 366

Answers (2)

Arturo Hinojosa
Arturo Hinojosa

Reputation: 106

To read and write data between Keyspaces and Apache Spark by using the open-source Spark Cassandra Connector all you have to do is update the partitioner for your Keyspaces account.

Docs: https://docs.aws.amazon.com/keyspaces/latest/devguide/spark-integrating.html

Upvotes: 3

Erick Ramirez
Erick Ramirez

Reputation: 16393

The issue as the error states is that AWS Keyspaces uses a partitioner (com.amazonaws.cassandra.DefaultPartitioner) that isn't supported by the Spark-Cassandra-connector.

There isn't a lot of public docs around what the underlying database is for AWS Keyspaces so I've long-suspected that there's a CQL API engine sitting in front of Keyspaces so it "looks" like Cassandra but it's probably backed by something else like Dynamo DB. I'm more than happy to be corrected by someone here from AWS just so I can put that to bed. 🙂

The default Cassandra partitioner is Murmur3Partitioner and is the only recommended partitioner. The older partitioners such as RandomPartitioner and ByteOrderedPartitioner are supported only for backward compatibility but should never be used for new clusters.

Finally, we don't test the Spark connector against AWS Keyspaces so be prepared for a lot of surprises there. Cheers!

Upvotes: 1

Related Questions