Badhrinath Canessane
Badhrinath Canessane

Reputation: 3528

Cassandra - alternate way for clustering key with ORDER BY and UPDATE

My schema is :

CREATE TABLE friends (
     userId timeuuid,
     friendId timeuuid,
     status varchar, 
     ts timeuuid,   
     PRIMARY KEY (userId,friendId)
);

CREATE TABLE friends_by_status (
    userId timeuuid,
    friendId timeuuid,
    status varchar, 
    ts timeuuid,   
    PRIMARY KEY ((userId,status), ts)
)with clustering order by (ts desc);

Here, whenever a friend-request is made, I'll insert record in both tables. When I want to check one to one status of users, i'll use this query:

SELECT status FROM friends WHERE userId=xxx AND friendId=xxx;

When I need to query all the records with pending status, i'll use :

SELECT * FROM friends_by_status WHERE userId=xxx AND status='pending';

But, when there is a status change, I can update the 'status' and 'ts' in the 'friends' table, but not in the 'friends_by_status' table as both are part of PRIMARY KEY.

You could see that even if I denormalise it, I definitely need to update 'status' and 'ts' in 'friends_by_status' table to maintain consistency.

Only way I can maintain consistency is to delete the record and insert again.
But frequent delete is also not recommended in cassandra model. As said in Cassaandra Spottify summit.

I find this as the biggest limitation in Cassandra.

Is there any other way to sort this issue.

Any solution is appreciated.

Upvotes: 3

Views: 976

Answers (2)

Jim Meyer
Jim Meyer

Reputation: 9475

I don't know how soon you need to deploy this, but in Cassandra 3.0 you could handle this with a materialized view. Your friends table would be the base table, and the friends_by_status would be a view of the base table. Cassandra would take care updating the view when you changed the base table.

For example:

CREATE TABLE friends ( userid int, friendid int, status varchar, ts timeuuid, PRIMARY KEY (userId,friendId) );
CREATE MATERIALIZED VIEW friends_by_status AS
    SELECT userId from friends WHERE userID IS NOT NULL AND friendId IS NOT NULL AND status IS NOT NULL AND ts IS NOT NULL
    PRIMARY KEY ((userId,status), friendID);

INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 500, 'pending', now());
INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 501, 'accepted', now());
INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 502, 'pending', now());
SELECT * FROM friends;                

 userid | friendid | status   | ts
--------+----------+----------+--------------------------------------
      1 |      500 |  pending | a02f7fe0-49f9-11e5-9e3c-ab179e6a6326
      1 |      501 | accepted | a6c80980-49f9-11e5-9e3c-ab179e6a6326
      1 |      502 |  pending | add10830-49f9-11e5-9e3c-ab179e6a6326

So now in the view you can select rows by the status:

SELECT * FROM friends_by_status WHERE userid=1 AND status='pending';

 userid | status  | friendid
--------+---------+----------
      1 | pending |      500
      1 | pending |      502

(2 rows)

And then when you update the status in the base table, it automatically updates in the view:

UPDATE friends SET status='pending' WHERE userid=1 AND friendid=501;
SELECT * FROM friends_by_status WHERE userid=1 AND status='pending';

 userid | status  | friendid
--------+---------+----------
      1 | pending |      500
      1 | pending |      501
      1 | pending |      502

(3 rows)

But note that in the view you couldn't have ts as part of the key, since you can only add one non-key field from the base table as part of the key in the view, which in your case would be adding 'status' to the key.

I think the first beta release for 3.0 is coming out tomorrow if you want to try this out.

Upvotes: 4

Alec  Collier
Alec Collier

Reputation: 1523

Why do you need status to be in the primary key for your second table? If this was your schema:

CREATE TABLE friends_by_status (
userId timeuuid,
friendId timeuuid,
status varchar, 
ts timeuuid,   
PRIMARY KEY ((userId), status, ts) 
with clustering order by (ts desc));

you can update the status as needed and still filter by it. You will be storing more data under one partition but it seems like you are storing one row for each friend a user has. This will be the same as in the first table, so I don't see partition size being a problem.

Upvotes: 0

Related Questions