Deepali
Deepali

Reputation: 25

How to read data from cassandra datastax cloud in Spark

How can I read data from Cassandra Datastax in spark 2.0?

This is what I tried -

val df = spark.read.format("org.apache.spark.sql.cassandra").options(Map("keyspace" -> "my_keyspace",
        "table" -> "my_table",
        "spark.cassandra.connection.config.cloud.path" -> "file:///home/training/secure-connect-My_path.zip",
        "spark.cassandra.auth.password" -> "password",
        "spark.cassandra.auth.username" -> "Username"
      ))
      .load()

I'm getting this error:

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark.apache.org/third-party-projects.html

When I'm using datastax zip why do I need to install Cassandra or do any additional step?

Using the same zip file , I can read data in java program. Why am I unable to read into Spark?

Upvotes: 0

Views: 1670

Answers (2)

Alex Ott
Alex Ott

Reputation: 87119

DataStax Astra is only natively supported in Spark Cassandra Connector 2.5.0+, that requires Spark 2.4 (although it works with 2.3 as well). Theoretically you can extract certificates, and other information from the secure bundle, and use it, but it's tedious task. So it's better to upgrade Spark version.

but initial issue is that the package is not provided - see @flightc's answer.

Upvotes: 2

Erick Ramirez
Erick Ramirez

Reputation: 16303

You're on the right track. If you were connecting from a Spark shell, you would pass the details like this:

$ spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.0 \
  --files /path/to/your/secure-connect-dbname.zip \
  --conf spark.cassandra.connection.config.cloud.path=secure-connect-dbname.zip \
  --conf spark.cassandra.auth.username=astra_username \
  --conf spark.cassandra.auth.password=astra_password

Then your code would look something like:

import org.apache.spark.sql.cassandra._

val df = spark.read.cassandraFormat("ks_name", "tbl_name").load()

For details, see the Spark Cassandra Connector documentation on connecting to Astra. There's also this blog post from Alex Ott, "Advanced Apache Cassandra Analytics Now Open For All". Cheers!

Upvotes: 2

Related Questions