Ray Qiu
Ray Qiu

Reputation: 171

Spark Cassandra Connector - perPartitionLimit

Is there a way in the spark cassandra connector to achieve the side side filtering that is equivalent to per partition limit in cql or perPartitionLimit in the native cassandra java driver?

Note that here it is limit per cassandra partition not per spark partition (which is supported by the existing limit function in the connector).

spark 2.0.1, connector - 2.0.0-M3

Upvotes: 1

Views: 429

Answers (2)

RussS
RussS

Reputation: 16576

The Spark Cassandra Connector built in limit API (as of 2.0.0-M3) can only limit by C* Token Range.

If you are using Cassandra greater than 3.6 you can manually add a per partition limit in the .where API.

See https://issues.apache.org/jira/browse/CASSANDRA-7017

sc.cassandraTable(...).where("PER PARTITION LIMIT 10") 

Upvotes: 0

Ray Qiu
Ray Qiu

Reputation: 171

Thanks for the initial answer from RussS. I got it to work by using the following:

First, we need to use "PER PARTITION LIMIT".

Second, if you have other where clauses, this needs to be combined with one of them, as follows:

sc.cassandraTable(...).where("event_type = 1 PER PARTITION LIMIT 5")

instead of

sc.cassandraTable(...).where("event_type = 1).where("PER PARTITION LIMIT 5")

Otherwise an AND keyword will be generated before "PER PARTITION LIMIT", which will cause an error.

Upvotes: 1

Related Questions