user9270170
user9270170

Reputation:

Cassandra query just return 5000 rows

I use this code to query data from cassandra, but the result is only 5000 rows, but in fact there are 120,000 rows in the database table, how can I query all the data?

from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import SimpleStatement
import pandas as pd

cluster = Cluster(contact_points=['192.168.2.4'],port=9042)
session = cluster.connect()

def testContectRemoteDatabase():
    contact_points = ['192.168.2.4']
    auth_provider = PlainTextAuthProvider(username='XXX', password='XX')
    cluster = Cluster(contact_points=contact_points, auth_provider=auth_provider)
    session = cluster.connect()
    cql_str = 'select * from DB1.mytable ;'
    simple_statement = SimpleStatement(cql_str, consistency_level=ConsistencyLevel.ONE)
    execute_result = session.execute(simple_statement, timeout=None)
    result = execute_result._current_rows
    cluster.shutdown()
    df = pd.DataFrame(result)
    df.to_csv('./my_test.csv', index=False, mode='w', header=True)

if __name__ == '__main__':
    testContectRemoteDatabase()

Upvotes: 3

Views: 1532

Answers (1)

Selcuk
Selcuk

Reputation: 59184

This is by design:

By default, Session.default_fetch_size controls how many rows will be fetched per page. This can be overridden per-query by setting fetch_size on a Statement. By default, each page will contain at most 5000 rows.

Setting this to None will disable automatic paging:

simple_statement = SimpleStatement(
    cql_str, consistency_level=ConsistencyLevel.ONE, fetch_size=None)

Upvotes: 4

Related Questions