Reputation: 31
Cassandra CQL3 SELECT
statement allows using IN
tuples like in
SELECT * FROM posts
WHERE userid='john doe' AND (blog_title, posted_at)
IN (('John''s Blog', '2012-01-01), ('Extreme Chess', '2014-06-01'))
as seen from CQ3 spec: http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt
Is there limitation for amount of tuples can be used in SELECT's IN clause? What is the maximum?
Upvotes: 3
Views: 3392
Reputation: 57808
Rebecca Mills of DataStax provides a definite limit on the number of keys allowed in an IN
statement (Things you should be doing when using Cassandra drivers - point #22):
...specifically the limit on the number of keys in an IN statement, the most you can have is 65535. But practically speaking you should only be using small numbers of keys in INs, just for performance reasons.
I assume that limit would also apply to the number of tuples that you could specify as well. Honestly though, I wouldn't try to top that out. If you sent it a large number, it wouldn't perform well at all. The CQL documentation on the SELECT CLAUSE warns users about this:
When not to use IN
The recommendations about when not to use an index apply to using IN in the WHERE clause. Under most conditions, using IN in the WHERE clause is not recommended. Using IN can degrade performance because usually many nodes must be queried. For example, in a single, local data center cluster with 30 nodes, a replication factor of 3, and a consistency level of LOCAL_QUORUM, a single key query goes out to two nodes, but if the query uses the IN condition, the number of nodes being queried are most likely even higher, up to 20 nodes depending on where the keys fall in the token range.
Suffice to say that while the maximum number of tuples you could pass is a matter of mathematics, the number of tuples you should pass will depend on your cluster configuration, JVM implementation, and a little bit of common sense.
Upvotes: 8