Reputation: 723
I want to SELECT stuff WHERE value is not NAN. How to do it? I tried different options:
WHERE value != NAN
WHERE value is not NAN
WHERE value == value
None of these attempts succeeded.
I see that it is possible to write WHERE value = NAN
, but is there a way to express inequality?
Upvotes: 3
Views: 344
Reputation: 16393
Cassandra is designed for OLTP workloads so reads are optimised for retrieving specific partitions such that the filter is of the form:
SELECT ... FROM ... WHERE partition_key = ?
A query that has an inequality filter is retrieving "everything except partition X" and is not really OLTP because Cassandra has to perform a full table scan to check all records which do NOT match the filter. This query does not scale so is not supported.
As far as I'm aware, the inequality operator (!=
) only works in the conditional section of lightweight transactions that only applies to UPDATE
or DELETE
, not SELECT
statements. For example:
UPDATE ... SET ... WHERE ... IF condition
If you have a complex search use case, you should look at using Elasticsearch or Apache Solr on top of Cassandra. If you have an analytics use case, consider using Apache Spark to query the data in Cassandra. Cheers!
Upvotes: 2
Reputation: 13791
As you noted, none of the alternatives you tried work today:
although the !=
operator is recognized by the parser, it is unfortunately not supported in WHERE clause. This is true for both Cassandra and Scylla. I opened https://github.com/scylladb/scylladb/issues/12736 as an feature request in Scylla to add support for !=
.
The IS NOT ...
syntax is not relevant - it is only supported in the specific way IS NOT NULL
, and even that is not supported in WHERE (see https://github.com/scylladb/scylladb/issues/8517).
WHERE value = value
(note a single equals sign is the SQL and CQL syntax, not '==' as in C) is currently not supported, you can only check equality of a column to a constant, not check the equality of two columns. Again this is true for both Cassandra and Scylla. Scylla is now in the process of improving the power of the WHERE expressions, and at the end of this process this sort of expression will be supported.
I think your best solution today is just to read all the data, and filter out NaN yourself, in the client. The performance loss should be minimal - just the network overhead - because even if Scylla did this filtering for you it would still need to read the data from disk and do this filtering - it's not like it can get this inequality check "for free". This is unlike the equality check (WHERE value = 3
) where Scylla can jump directly to the position of value = 3
(if "value" is the partition key or clustering key) and read only that. This efficiency concern is the reason why historically Scylla and Cassandra supported the equality operator, and not the inequality operator.
Upvotes: 4