Reputation: 171
Is there a way in the spark cassandra connector to achieve the side side filtering that is equivalent to per partition limit in cql or perPartitionLimit in the native cassandra java driver?
Note that here it is limit per cassandra partition not per spark partition (which is supported by the existing limit function in the connector).
spark 2.0.1, connector - 2.0.0-M3
Upvotes: 1
Views: 429
Reputation: 16576
The Spark Cassandra Connector built in limit API (as of 2.0.0-M3) can only limit by C* Token Range.
If you are using Cassandra greater than 3.6 you can manually add a per partition limit in the .where
API.
See https://issues.apache.org/jira/browse/CASSANDRA-7017
sc.cassandraTable(...).where("PER PARTITION LIMIT 10")
Upvotes: 0
Reputation: 171
Thanks for the initial answer from RussS. I got it to work by using the following:
First, we need to use "PER PARTITION LIMIT".
Second, if you have other where clauses, this needs to be combined with one of them, as follows:
sc.cassandraTable(...).where("event_type = 1 PER PARTITION LIMIT 5")
instead of
sc.cassandraTable(...).where("event_type = 1).where("PER PARTITION LIMIT 5")
Otherwise an AND keyword will be generated before "PER PARTITION LIMIT", which will cause an error.
Upvotes: 1