Reputation: 5705
I can load whole Cassandra table as dataframe as below
val tableDf = sparkSession.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> table, "keyspace" -> keyspace))
.load()
But I couldn't find a way to fetch rows by primary key, something like
select * from table where key = ''
Is there a way to do this?
Upvotes: 0
Views: 10425
Reputation: 216
Java way to the same is :
SparkSession sparkSession = SparkSession.builder().appName("Spark Sql Job").master("local[*]")
.config("spark.sql.warehouse.dir", "file:///c:/tmp/spark-warehouse")
.config("spark.cassandra.connection.host", "localhost")
.config("spark.cassandra.connection.port", "9042").getOrCreate();
SQLContext sqlCtx = sparkSession.sqlContext();
Dataset<Row> rowsDataset = sqlCtx.read().format("org.apache.spark.sql.cassandra").option("keyspace", "myschema")
.option("table", "mytable").load();
rowsDataset.show();
It should be the same for scala i believe
Upvotes: 0
Reputation: 6218
val tableDf = sparkSession.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> table, "keyspace" -> keyspace))
.load()
.filter("key='YOUR_KEY'")
Using this spark-cassandra-connector will use predicate pushdown and will fetch only required data.
Dataframes and Predicate pushdown
Upvotes: 7