chrisTina
chrisTina

Reputation: 2368

Cassandra - filter rows based on a range

Using cassandra and spark and datastax's spark-cassandra-connector.

In the spark-cassandra-connector it support gives such filter example:

sc.cassandraTable("test", "cars").select("id", "model").where("color = ?", "black").toArray.foreach(println)

Basically it filters the color column with black. However, can I filter the row based on a range? Like I want to filter the range column which is a long type and the range falls between 100000 and 200000 ? Does cql support such a range filter?

Upvotes: 0

Views: 1061

Answers (1)

Nicola Ferraro
Nicola Ferraro

Reputation: 4189

CQL supports range queries only on clustering columns. Range queries can be expressed as in SQL by using two bounding conditions on the same field, for instance, in spark-cassandra-connector you will write:

.where("my_long >= ? and my_long < ?", 1L, 100L)

This will work as long as the "my_long" column is the first clustering column. Clustering columns are the columns that follows the declaration of the partition columns in the primary key.

For instance, you can run range queries on my_long column if the primary key is declared as follows:

  • PRIMARY KEY (pk1, my_long)
  • PRIMARY KEY (pk1, my_long, pk3)
  • PRIMARY KEY ((pk1, pk2), my_long)
  • PRIMARY KEY ((pk1, pk2), my_long, pk4)
  • ...

As you see, in all the preceding cases, my_long follows the declaration of partition key in the primary key. If the column belongs to the clustering columns but it's not the first one, you have to provide an equality condition for all preceding columns. For example:

  • PRIMARY KEY (pk1, pk2, my_long) --> .where("pk2=? and my_long>? and my_long
  • PRIMARY KEY (pk1, pk2, pk3, my_long) --> .where("pk2=? and pk3=? and my_long>? and my_long
  • PRIMARY KEY ((pk1, pk2), pk3, my_long) --> .where("pk3=? and my_long>? and my_long
  • PRIMARY KEY ((pk1, pk2), pk3, my_long, pk5) --> .where("pk3=? and my_long>? and my_long

Note: spark-cassandra-connector adds by default the clause "ALLOW FILTERING" in all the queries. If you try to run the examples above in cqlsh, you have to specify that clause manually.

Upvotes: 5

Related Questions