Reputation: 1326
I am using datastax's connector for connecting to cassandra.
Below is the code that I used,
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.datastax.spark.connector._
val sqlContext = new SQLContext(sc)
val conf=new SparkConf(true)
.set("spark.cassandra.connection.host","127.0.0.1")
.set("spark.cassandra.auth.username","the_username")
.set("spark.cassandra.auth.password","the_password")
val sc=new SparkContext("local","the_keyspace",conf)
val table_1 = sc.cassandraTable("the_keyspace","table_1")
val table_2 = sc.cassandraTable("the_keyspace","table_2")
Now, the way to expose this table as an RDD is by using a case class as a placeholder as below
case class Person(name: String, age: Int)
sc.cassandraTable[Person](“test”, “persons”).registerAsTable(“persons”)
This works fine, but I have around 50+ columns in each table and it is a real pain to type them out in a case class and also identifying their types.
Is there a way to overcome this ? I am used to getting the csv file as a table using databricks-csv and I can register them as tables and run queries on them without using a case class placeholder, is there something similar for my use case here.
If there are none, it would be helpful if there are some generators that I can use to auto-generate these case classes.
Upvotes: 0
Views: 3734
Reputation: 330353
You can create data frame directly:
val df = sqlContext
.read.format("org.apache.spark.sql.cassandra")
.options(Map("keyspace" -> "test", "table" -> "persons"))
.load()
Upvotes: 2