How can I get a row count in Cassandra with only CQL?

Question

I would like our developers to be able to tell how many rows (roughly) are in a table.

Doing "select count (*) from table" doesn't work because you get a timeout error unless the table is very small (under a million rows).

Increasing the client_timeout in ~/.cassandra/cqlshrc has never worked for us, and even if it did it's probably not a good idea to let developers run 10 minute queries against production. :)

And since this a production cluster, developers do not have ssh access to the servers to run "nodetool" locally on the servers.

Running "nodetool" remotely requires enabling remote JMX and applying JMX security. That seems a little much just to get an estimate of a table size. Is there anything dangerous someone could do with that JMX access?

Are there any other options at all to get any kind of estimate of the number of rows in a table?

Thanks!

Jonathan · Accepted Answer

You could try exporting the table to a csv then just doing a line count of the csv. This isn't ideal performance-wise, but it won't time out like count does.

COPY table_name TO filename_or_path.csv;

For more information on copy see: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/copy_r.html

Another, more "estimative" option if you are running datastax cassandra is to install OpsCenter on to one of the nodes, which will create a UI you can expose to the developers that has a lot of useful metrics and health status. I don't think it has number or rows directly, but it does have the amount of data which could be divided by the average row size if you know this value.

http://www.datastax.com/products/datastax-enterprise-visual-admin

*EDIT: I just realized the link above is for the enterprise edition of OpsCenter, but does include a description of it. There is (or at least was) a community edition available as well.

How can I get a row count in Cassandra with only CQL?

Answers (1)

Related Questions