Reputation: 33
I'm new to Cassandra and I've been having some issues trying to delete multiple rows in table. I have a table defined as follows:
CREATE TABLE aze.isis_users (
identifier text PRIMARY KEY,
iteration_count int,
password_expires_on date,
passwordword blob,
roleidentifier text,
salt blob
) WITH additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
I would like to be able to delete all rows with roleidentifier = lim_user
I get the following 2 errors with 2 different query :
Upvotes: 3
Views: 4067
Reputation: 16353
The same question was asked on https://community.datastax.com/questions/13219/ so I'm re-posting my answer here for posterity.
In Cassandra, database records known as "partitions" are distributed randomly across nodes in the cluster. The partition key (row ID in traditional RDBMS) is used by Cassandra to determine which node the partition (record) is stored. In order to delete a partition, you need to specify the partition key.
Understanding partitions and how data is distributed is critical to learning why Cassandra is different to traditional RDBMS. Patrick McFadin explains the concept of partitions in this short video in great detail with examples and diagrams which I highly recommend. It is an extract of the free DS201 Foundations of Apache Cassandra course on DataStax Academy.
For what it's worth, I've previously explained the differences between partition keys and primary keys in https://community.datastax.com/questions/6171/ if you're interested. I've included examples to make it easier to understand the concepts.
In your table, the partition key (which also happens to be the primary key) is the column identifier. Your delete statement:
DELETE FROM isis_users WHERE roleidentifier = "lim_user"
does not include identifier
in the WHERE
clause so Cassandra does not know which partition(s) to delete. This is the reason you are getting the error:
Some partition key parts are missing: identifier
It is not possible to filter with roleidentifier
since this column is not in the primary key. DELETE
statements can only contain columns which are in the primary key.
So why doesn't it work the same as RDBMS? The short answer is that Cassandra solves difficult problems of scalability and availability. Imagine you had 2000 nodes in your cluster with billions or trillions of records in a table all scattered across the nodes. If Cassandra didn't require you to specify the partition key, it will need to perform a full table scan on all 2000 nodes in order to find the records you're after. This works in traditional relational databases because (1) they can't hold as many records, and (2) aren't distributed across physical locations in the same way as Cassandra.
Since you're new to Cassandra, have a look at datastax.com/dev. If you scroll down the page, you'll see there are free hands-on tutorials which only take a few minutes to complete. The Cassandra Fundamentals course is a good starting point since you can quickly learn key concepts. These tutorials are interactive and run in your browser so you don't have anything to install or configure. Cheers!
Upvotes: 2
Reputation: 87234
It doesn't work this way in Cassandra. You need to have a full or partial primary key specified in the DELETE
command. If you want to delete by non-primary/partition key, then you need first to find rows with that value, extract primary key, and then delete by primary key.
You can find ways to do that in this answer.
Upvotes: 1