Dimaf
Dimaf

Reputation: 683

Counting 'rows' in column family of Cassandra using python driver

How count "rows" in column family of Cassandra using python driver more effectively? I use following code:

from cassandra.cluster import Cluster
from sys import stdout

servers = ['server1', 'server2']
cluster = Cluster(servers)
session = cluster.connect()

result = session.execute('select * from ks1.t1')

count = 0

for i in result:
    count += 1

print count

Upvotes: 1

Views: 4402

Answers (3)

U W
U W

Reputation: 1290

To achieve this in Python, why not the following:

from cassandra.cluster import Cluster

servers = ['server1', 'server2']
cluster = Cluster(servers)
session = cluster.connect()

result = session.execute('select count(*) from ks1.t1')

count = 0
for row in result: # will only be 1 row
    count += row.count

print(count)

Upvotes: 1

Brad Schoening
Brad Schoening

Reputation: 1381

Brian Hess has a stand alone 'cassandra-count'.

Simple program to count the number of records in a Cassandra table. By splitting the token range using the numSplits parameter, you can reduce the amount each query is counting and reduce the probability of timeouts.

It is true the Spark is well-suited to this operation, however the goal of this program is to be a simple utility that does not require Spark.

https://github.com/brianmhess/cassandra-count

Upvotes: 0

doanduyhai
doanduyhai

Reputation: 8812

Terrible way to count row. Basically you're doing a full table scan.

To count exact rows in a distributed system is hard.

You can have an estimate of the number of partitions (partition == row if you don't have clustering columns in your table) using nodetool tablestats/cfstats


If you absolutely need to have an exact count of the number of rows, use a co-located Spark install to fetch all data in Spark memory locally and then count them with Spark. This way the counting will be distributed and not overwhelm the coordinator.

Sample scala code:

import com.datastax.spark.connector._

sc.cassandraTable("keyspace", "table_name").count()

Upvotes: 0

Related Questions