Reputation: 13108
I've seen references to a ‘Number of key(estimate)
from running nodetool cfstats
, but at least in my system (Cassandra version 3.11.3), I don't see it:
Table: XXXXXX
SSTable count: 4
Space used (live): 2393755943
Space used (total): 2393755943
Space used by snapshots (total): 0
Off heap memory used (total): 2529880
SSTable Compression Ratio: 0.11501749368144083
Number of partitions (estimate): 1146
Memtable cell count: 296777
Memtable data size: 147223380
Memtable off heap memory used: 0
Memtable switch count: 127
Local read count: 9
Local read latency: NaN ms
Local write count: 44951572
Local write latency: 0.043 ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 2144
Bloom filter off heap memory used: 2112
Index summary off heap memory used: 240
Compression metadata off heap memory used: 2527528
Compacted partition minimum bytes: 447
Compacted partition maximum bytes: 43388628
Compacted partition mean bytes: 13547448
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0
Is there some way to approximate select count(*) from XXXXXX
with this version of Cassandra?
Upvotes: 3
Views: 451
Reputation: 57798
This was changed with CASSANDRA-13722. The "number of keys" estimate always meant "number of partitions" anyway, this just makes it apparent.
To approximate the number of rows in a large table, you could take that value (number of partitions) as a starting point. Then approximate an average of the number of clustering key combinations (rows), and you should be able to make an educated guess at it.
Another thought, would figure out the size (in bytes) of one row. Then look at the P50 of the output of nodetool tablehistograms keyspacename.tablename
:
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 2.00 35.43 4866.32 124 1
Divide the P50 (50th percentile) of Partition Size by the size of one row. That should give you the average number of rows returned for that table. Then multiply that by the "number of partitions" and you should have your number for that node.
How does one get the size of one row in Cassandra?
$ bin/cqlsh 127.0.0.1 -u aaron -p yourPasswordSucks -e "SELECT * FROM system.local WHERE key='local';" > local.txt
$ ls -al local.txt
-rw-r--r-- 1 z001mj8 DHC\Domain Users 2321 Sep 16 15:08 local.txt
Obviously, you'll want to take things out like pipe delimiters and the row header (not to mention accounting for the size difference in strings vs. numerics), but the final byte size of the file should put you in the ballpark.
Upvotes: 1
Reputation: 2196
The "number of keys" is the same as "the number of partitions" - again an estimate. If your partition key is the primary key (no clustering columns), then you'll have an estimate for the number of rows on that node. Otherwise, it's simply that, the estimate of number of partition key values.
-Jim
Upvotes: 1