Reputation: 1754
How can I delete a row from Cassandra and get the value it had just before the deletion?
I could execute a SELECT
and DELETE
query in series, but how can I be sure that the data was not altered concurrently between the execution of those two queries?
I've tried to execute the SELECT
and DELETE
queries in a batch but that seems to be not allowed.
cqlsh:foo> BEGIN BATCH
... SELECT * FROM data_by_user WHERE user = 'foo';
... DELETE FROM data_by_user WHERE user = 'foo';
... APPLY BATCH;
SyntaxException: line 2:4 mismatched input 'SELECT' expecting K_APPLY (BEGIN BATCH [SELECT]...)
In my use case I have one main table that stores data for items. And I've build several tables that allow to lookup items based on those informations. If I delete an item from the main table, I must also remove it from the other tables.
CREATE TABLE items (id text PRIMARY KEY, owner text, liking_users set<text>, ...);
CREATE TABLE owned_items_by_user (user text, item_id text, PRIMARY KEY ((user), item_id));
CREATE TABLE liked_items_by_user (user text, item_id tect, PRIMARY KEY ((user), item_id));
...
I'm afraid the tables might contain wrong data if I delete an item and at the same time someone e.g. hits the like button of that same item.
deleteItem
method execute a SELECT
query to fetch the current row of the item from the main tablelikeItem
method that gets executed at the same times runs an UPDATE
query and inserts the item into the owned_items_by_user
, liked_items_by_user
, ... tables. This happens after the SELECT
statement was executed and the UPDATE
query is executed before the DELETE
query.deleteItem
method deletes the items from the owned_items_by_user
, liked_items_by_user
, ... tables based on the data just retrieved via the SELECT
statement. This data does not yet contain the just added like. The item is therefore deleted, but the just added like remains in the liked_items_by_user
table.Upvotes: 1
Views: 1813
Reputation: 6717
You can do a select beforehand, then do a lightweight transaction on the delete to ensure that the data still looks exactly like it did when you selected. If it does, you know the latest state before you deleted. If it does not, keep retrying the whole procedure until it sticks.
Upvotes: 2
Reputation: 28511
Unfortunately you cannot do a SELECT
query inside a batch statement. If you read the docs here, only insert, update, and delete statements can be used.
What you're looking for is atomicity on the execution, but batch statements are not going to be the way forward. If the data has been altered, your worst case situation is zombies, or data that could reappear.
Cassandra uses a grade period mechanism to deal with this, you can find the details here. If for whatever reason, this is critical to your business logic, the "best" thing you can do in this situation is to increase the consistency level, or restructure the read pattern at application level to not rely on perfect atomicity, whichever the right trade off is for you. So either you give up some of the performance, or tune down the requirement.
In practice, QUORUM
should be more than enough to satisfy most situations most of the time. Alternatively, you can do an ALL
, and you pay the performance penalty, but that means all replicas for the given foo
partition key will have to acknowledge the write both in the commitlog
and the memtable
. Note, this still means a flush
from the commitlog will need to happen before the delete is complete
, but you can tune the consistency to the level you require.
You don't have atomicity in the SQL sense, but depending on throughput it's unlikely that you will need it(touch wood).
TLDR:
USE CONSISTENCY ALL;
DELETE FROM data_by_user WHERE user = 'foo';
That should do the trick. The error you're seeing now is basically the ANTLR3 Grammar parser for CQL 3, which is not designed to accept to SELECT queries inside batches simply because they are not supported, you can see that here.
Upvotes: 0