Vish
Vish

Reputation: 184

Does spark load whole data from cassandra

I have following code.

Dataset<Row> kpiDF = spark.read().format("org.apache.spark.sql.cassandra")
.options(new HashMap<String, String>(){{put("keyspace",keyspace);put("table", table);}})
.load()
.filter("kpi='test'")

my question is, where this will load all the data from cassandra table and then apply filter or it will only load 'test' kpi data from cassandra?

Upvotes: 1

Views: 1247

Answers (1)

undefined_variable
undefined_variable

Reputation: 6218

If column kpi is partition key then spark-cassandra-connector will only read corresponding records.

By default predicate pushdown is enabled.

If cassandra cannot suffice the filter condition then spark-cassandra-connector will read complete data and then apply filter.

you can check whether filter is being pushed to cassandra using df.explain.

Predicate Pushdown in spark-cassandra-connector

Upvotes: 1

Related Questions