extraDarker
extraDarker

Reputation: 119

Cassandra: How to query the complete data set?

My table has 77k entries (number of entries keep increasing this a high rate), I need to make a select query in CQL 3. When I do select count(*) ... where (some_conditions) allow filtering I get:

 count
-------
 10000

(1 rows)

Default LIMIT of 10000 was used. Specify your own LIMIT clause to get more results. 

Let's say the 23k rows satisfied this some_condition. The 10000 count above is of the first 10k rows of these 23k rows, right? But how do I get the actual count?

More importantly, How do I get access to all of these 23k rows, so that my python api can perform some in-memory operation on the data in some columns of the rows. Are there a some sort pagination principles in Cassandra CQL 3.

I know I can just increase the limit to a very large number but that's not efficient.

Upvotes: 1

Views: 1554

Answers (3)

yteng
yteng

Reputation: 117

Another way is to write python code, the cqlsh indeed is python script.

use

statement = " select count(*) from SOME_TABLE"

future = session.execute_async(statement)

rows = future.result()

count = 0 for row in rows: count = count + 1

the above is using cassandra python driver PAGE QUERY feature.

Upvotes: 0

Aaron
Aaron

Reputation: 57748

Working Hard is right, and LIMIT is probably what you want. But if you want to "page" through your results at a more detailed level, read through this DataStax document titled: Paging through unordered partitioner results.

This will involve using the token function on your partitioning key. If you want more detailed help than that, you'll have to post your schema.

While I cannot see your complete table schema, by virtue of the fact that you are using ALLOW FILTERING I can tell that you are doing something wrong. Cassandra was not designed to serve data based on multiple secondary indexes. That approach may work with a RDBMS, but over time that query will get really slow. You should really design a column family (table) to suit each query you intend to use frequently. ALLOW FILTERING is not a long-term solution, and should never be used in a production system.

Upvotes: 4

Helping Hand..
Helping Hand..

Reputation: 2440

you just have to specify limit with your query.

let's assume your database is containing under 1 lack records so if you will execute below query it will give you the actual count of the records in table.

select count(*) ... where (some_conditions) allow filtering limit 100000;

Upvotes: 1

Related Questions