Reputation: 16979
Is the in
operator on the cluster key more efficient than multiple equality queries?
select * from table where primaryKey1 = 1 and primaryKey2 = 2 and clusterKey in (1,2)
vs. 2 queries.
select * from table where primaryKey1 = 1 and primaryKey2 = 2 and clusterKey = 1
select * from table where primaryKey1 = 1 and primaryKey2 = 2 and clusterKey = 2
Upvotes: 3
Views: 1720
Reputation: 186
It depends on how much values you are going to put inside IN operator. In terms of efficiency, both will take around the same time to return resultSet. But if your values are going to be larger, it's a good practice to use multiple queries instead of using IN operator.
This is due to the distributed node structure of Cassandra when you are using a single query with IN keyword, the request/response will be handled by only one node. Moreover, if something goes wrong Cassandra will run the whole query again from scratch and retrieved results won't be stored.
You can use Async queries in Java or python which will run multiple queries as a batch process. I will allow request/response to go through multiple nodes and if a single query fails it can retry again for only that specific query.
For example:
CREATE TABLE IF NOT EXISTS users (id uuid PRIMARY KEY, name text);
SELECT * FROM users WHERE id IN (
e6af74a8-4711-4609-a94f-2cbfab9695e5,
281336f4-2a52-4535-847c-11a4d3682ec1,
c32b8d37-89bd-4dfe-a7d5-5f0258692d05
);
This is not necessarily optimal: this query will be sent to a coordinator node, which will then have to query replicas for each partition key. Considering that we have a smart token-aware driver, it would be more efficient to send an individual query for each partition key (SELECT * FROM users WHERE id = ?), which would reach the right replica directly. Then all that's left is to collate the results client-side.
Future<List<ResultSet>> future = ResultSets.queryAllAsList(session,
"SELECT * FROM users WHERE id = ?",UUID.fromString("e6af74a8-4711-
4609-a94f-2cbfab9695e5"), UUID.fromString("281336f4-2a52-4535-847c-
11a4d3682ec1"));
for (ResultSet rs : future.get()) {
process the result set
}
For more clarification please read links below: https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/
https://www.datastax.com/dev/blog/java-driver-async-queries
Upvotes: 6