hard coder
hard coder

Reputation: 5705

How to load rows from Cassandra table as Dataframe in Spark?

I can load whole Cassandra table as dataframe as below

val tableDf = sparkSession.read
      .format("org.apache.spark.sql.cassandra")
      .options(Map( "table" -> table, "keyspace" -> keyspace))
      .load()

But I couldn't find a way to fetch rows by primary key, something like

select * from table where key = ''

Is there a way to do this?

Upvotes: 0

Views: 10425

Answers (2)

tarun
tarun

Reputation: 216

Java way to the same is :

SparkSession sparkSession = SparkSession.builder().appName("Spark Sql Job").master("local[*]")
                .config("spark.sql.warehouse.dir", "file:///c:/tmp/spark-warehouse")
                .config("spark.cassandra.connection.host", "localhost")
                .config("spark.cassandra.connection.port", "9042").getOrCreate();
        SQLContext sqlCtx = sparkSession.sqlContext();
        Dataset<Row> rowsDataset = sqlCtx.read().format("org.apache.spark.sql.cassandra").option("keyspace", "myschema")
                .option("table", "mytable").load();
        rowsDataset.show();

It should be the same for scala i believe

Upvotes: 0

undefined_variable
undefined_variable

Reputation: 6218

val tableDf = sparkSession.read
      .format("org.apache.spark.sql.cassandra")
      .options(Map( "table" -> table, "keyspace" -> keyspace))
      .load()
      .filter("key='YOUR_KEY'")

Using this spark-cassandra-connector will use predicate pushdown and will fetch only required data.

Dataframes and Predicate pushdown

Upvotes: 7

Related Questions