Reputation: 983
I have 100 million rows in cassandra's table. The schema is:
id int, key varchar, row_hash varchar, version int
and the PK is: ((version), id). The query to create this schema is:
c_sql = "CREATE TABLE IF NOT EXISTS {} (id varchar, version int, row_hash varchar, PRIMARY KEY((version), id))".format( self.table_name )
Does this statement make the version
as the Partition key?
Also, my select query which is apparently taking a long time as #rows keep on increasing is:
row_check_query = "SELECT {} FROM {} WHERE {}={} AND {}='{}' ".format( "row_hash", self.table_name, "version", self.version, "id", key )
Upvotes: 1
Views: 179
Reputation: 2427
Yes, version
is the partition key. id
is a clustering column in your case.
You can use CQL Tracing to analyze your performance issues - https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshTracing.html
Depending on your data distribution you might get into a "wide row" scenario, having many records in a single version
partition, having to read a very arge partition can take time.
Upvotes: 2