Reputation: 184
I have following code.
Dataset<Row> kpiDF = spark.read().format("org.apache.spark.sql.cassandra")
.options(new HashMap<String, String>(){{put("keyspace",keyspace);put("table", table);}})
.load()
.filter("kpi='test'")
my question is, where this will load all the data from cassandra table and then apply filter or it will only load 'test' kpi data from cassandra?
Upvotes: 1
Views: 1247
Reputation: 6218
If column kpi
is partition key then spark-cassandra-connector will only read corresponding records.
By default predicate pushdown is enabled.
If cassandra cannot suffice the filter condition then spark-cassandra-connector will read complete data and then apply filter.
you can check whether filter is being pushed to cassandra using df.explain
.
Predicate Pushdown in spark-cassandra-connector
Upvotes: 1