Eric Vergnaud
Eric Vergnaud

Reputation: 21

DataStax DevCenter issue with count(*)

we're using DevCenter 1.2.0 and from the DataStax web site the below does not seem to be adressed by 1.2.1.

DevCenter 1.2.0 forces an implicit limit on 'select...' queries, which allows values up to 1000, and cannot be empty. The limit value from the limit box can be overridden by an explicit LIMIT clause.

This works fine when running 'standard' queries, but makes it impossible to get a valid result for select count() queries. This is because in Cassandra the LIMIT clause applies to the source rather than the result (unlike SQL btw). So if you do 'select count() from my_table' in DevCenter, you will get a result of 100 if the current limit is 100, 200 if the current limit is 200 and so on. Pretty useless I'm afraid.

If you override the limit by doing 'select count(*) from my_table limit 1000000000000' then you get a timeout.

Would it make sense to remove the implicit limit for 'select count' queries in DevCenter?

Upvotes: 2

Views: 3498

Answers (2)

djatnieks
djatnieks

Reputation: 734

DevCenter version 1.6.0 no longer adds an implicit LIMIT clause to select statements (as long as you are connected to Cassandra 2.0 or higher).

Upvotes: 2

Zain Malik
Zain Malik

Reputation: 162

This is not an issue with Devcenter. There are some thing you should consider

  • You will get the timeout with or without limit, because it has to do a full scan of your table. Because the query have no 'WHERE' clause. So having limit '1000000000000' or no limit is pretty much the same thing as for timeout.
  • If you need to do frequent queries to know the amount of all rows. This query shouldn't be your 1st preference. For example consider using any other table with a counter type.
  • "select count(*)" read the same number of rows as "select *" so why just remove the limit only from count wont help.
  • Limit isn't as useless. Imagine you just need a sample of 10 rows results out of millions.

In my opinion you should consider increasing the timeout or making a small app using any cassandra driver to do that count for you. Or consider HIVE or changing your model if you are going to use this query too much.

Upvotes: 3

Related Questions