chrisTina
chrisTina

Reputation: 2368

Cassandra Column Limit

When using Cassandra, in cqlsh, I type this:

cqlsh:info> SELECT count(*) FROM info.customerinfo WHERE KEY = 'ds10128832';

and got the following results:

 count
-------
 10000

Default LIMIT of 10000 was used. Specify your own LIMIT clause to get more results.

Basically I want to find how many columns are already stored in the rowkey ds10128832.

Does the output means that I got 10000 columns stored in that key and can not add more columns into it since the LIMIT is 10000? And more columns will not inserted to that key if it reaches 10000? If it is, how can I change this situation? Must I set a LIMIT? Because I have a lot of columns to store so I do not want to have a LIMIT.

Upvotes: 3

Views: 5891

Answers (2)

mildewey
mildewey

Reputation: 413

Cassandra terminology makes a difference between partitions and rows. The query result indicates that there are 10000 rows in the partition key ds10128832.

Actually, as catpaws pointed out, there is a default limit of 10000, so probably you have more rows with that partition key. To count the rest, you'll need to specify a higher LIMIT clause eg:

cqlsh:info> SELECT count(*) FROM info.customerinfo WHERE KEY = 'ds10128832' LIMIT 100000;

You may need to increment the limit number upwards if you find you keep hitting the limit during the query.

In your question you referred to counting COLUMNS and I've answered about ROWS. I hope I'm not misunderstanding your intent. Internally, Cassandra is storing "rows" based on your sorting keys as columns (actually sets of columns), which is what I'm assuming you're referring to. Jargon in this case is important. catpaws mentioned that there is a 2B column limit, this includes all the sub columns based on sorting keys and rows which will contribute to a limitation. Each of your rows will contribute a number of actual (internal) columns equal to the number of values in your schema that are not primary keys.

For example if your table is

CREATE TABLE info.customerinfo ( key text, account text, email text, screenname text, PRIMARY KEY (key, account) );

Then the count above would have counted the number of "account" rows on the partition key "ds10128832". Each (key, account) combination would be a unique logical row that would (internally) be two columns: one for email, one for screenname. Each customerinfo "key" could hypothetically have 1B such accounts before hitting the 2B limitation in columns imposed by Cassandra.

EDIT: Hitting the limit will throw an exception.

Upvotes: 4

catpaws
catpaws

Reputation: 2283

The maximum number of columns in a partition (row) is 2B. The default LIMIT in the output means cqlsh is limiting the number of results it shows to 10000. The default LIMIT in the output is explained on this page: http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__specifying-rows-returned-using-limit

In Cassandra 2.1.1, you can use query paging in cqlsh to get output of queries in 100-line chunks followed by the more prompt: http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/paging.html

CQL things, such as columns in a partition, that have a hard, upper limit are listed on this page: http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refLimits.html.

The COUNT(*) used in the select expression returns the number of rows that matched the query: http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__counting-returned-rows

Upvotes: 1

Related Questions