aviral sanjay
aviral sanjay

Reputation: 983

Unable to determine the reason for slow speed of Select query in Cassandra

I have 100 million rows in cassandra's table. The schema is: id int, key varchar, row_hash varchar, version int and the PK is: ((version), id). The query to create this schema is:

c_sql = "CREATE TABLE IF NOT EXISTS {} (id varchar, version int, row_hash varchar, PRIMARY KEY((version), id))".format( self.table_name )

Does this statement make the version as the Partition key?

Also, my select query which is apparently taking a long time as #rows keep on increasing is:

row_check_query = "SELECT {} FROM {} WHERE {}={} AND {}='{}' ".format( "row_hash", self.table_name, "version", self.version, "id", key )

Upvotes: 1

Views: 179

Answers (1)

Oded Peer
Oded Peer

Reputation: 2427

Yes, version is the partition key. id is a clustering column in your case.

You can use CQL Tracing to analyze your performance issues - https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshTracing.html

Depending on your data distribution you might get into a "wide row" scenario, having many records in a single version partition, having to read a very arge partition can take time.

Upvotes: 2

Related Questions