RagHaven
RagHaven

Reputation: 4347

Getting database entry when performing delete operation in Cassandra

I have a web service that is maintaining the state of a "request". The possible states are "Active" and "InActive". I am storing the request information in a Cassandra DB. I have two tables - one for Active requests and another for InActive Requests. They both have the same schema.

My schema is as follows:

ActiveRequests{
  UserId text,
  RequestId int,
  RequestData text
  PRIMARY KEY(UserId, RequestId)
}

I need to implement an API that will move a request from the Active state to the InActive state. I plan on doing this by deleting the entry from the Active table and then adding the removed entry to the InActive table.

In Cassandra it seems like a DELETE operation doesn't actually return the data that was deleted. So, I have to do a SELECT on the request entry(so that I can get all the request data for adding to the InActive table) and then do a DELETE operation. Is there a better way to do this?

EDIT

You may ask why I am maintaining Active and InActive requests as separate tables. I could potentially combine them into a single table and have an IsActive column. My reasoning for maintaining separate tables is as follows:

I want my queries to the Active Table to be very quick. If I want to query all the Active requests in a table that has both Active and InActive requests that won't be as optimal. The partitionKey is userId and I expect the InActive table to have several 1000 requestIds for a given UserId. But, Active should only have 10 or more requestIds per UserId.

Upvotes: 1

Views: 52

Answers (1)

Jeff Beck
Jeff Beck

Reputation: 3938

The basic answer to having DELETE return the data is that it really isn't something Cassandra can do. A delete in Cassandra is actually a write of a tombstone. Cassandra in general will not do reads before writes and needing that is actually considered an anti-pattern.

Another thing to remember is a delete in Cassandra means the data doesn't leave the system until sometime after your GC Grace settings for that table.

Are these requests at all time based? If they are you could think about bucketing the requests. So you would have a single table something like:

Requests{
  UserId text,
  TimeBucket text,
  RequestId int,
  RequestData text,
  Active boolean,
  PRIMARY KEY((UserId, TimeBucket) RequestId)
}

The time buckets could be per hour or minute what ever makes sense for your use case. You can then work through the given buckets with different selects. This will keep you from having too many requests for a given partition key. The assumption is the timebucket is big enough to cover most of the active requests so you end up not needing to also look at all the buckets.

I'm also not sure how long you plan to keep records if they are kept for long periods of time or forever this bucketing will make sure you don't end up with overly big partitions which could end up happening in the InActive table with the other setup.

Upvotes: 2

Related Questions