Query cassandra table from spark without using case classes

Question

I am using datastax's connector for connecting to cassandra.

Below is the code that I used,

import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.datastax.spark.connector._

val sqlContext = new SQLContext(sc)
val conf=new SparkConf(true)
.set("spark.cassandra.connection.host","127.0.0.1")
.set("spark.cassandra.auth.username","the_username")           
.set("spark.cassandra.auth.password","the_password")

val sc=new SparkContext("local","the_keyspace",conf)
val table_1 = sc.cassandraTable("the_keyspace","table_1")
val table_2 = sc.cassandraTable("the_keyspace","table_2")

Now, the way to expose this table as an RDD is by using a case class as a placeholder as below

case class Person(name: String, age: Int)
sc.cassandraTable[Person](“test”, “persons”).registerAsTable(“persons”)

This works fine, but I have around 50+ columns in each table and it is a real pain to type them out in a case class and also identifying their types.

Is there a way to overcome this ? I am used to getting the csv file as a table using databricks-csv and I can register them as tables and run queries on them without using a case class placeholder, is there something similar for my use case here.

If there are none, it would be helpful if there are some generators that I can use to auto-generate these case classes.

zero323 · Accepted Answer

You can create data frame directly:

val df = sqlContext
   .read.format("org.apache.spark.sql.cassandra")
   .options(Map("keyspace" -> "test", "table" -> "persons"))
   .load()

Query cassandra table from spark without using case classes

Answers (1)

Related Questions