Cassandra - alternate way for clustering key with ORDER BY and UPDATE

Question

My schema is :

CREATE TABLE friends (
     userId timeuuid,
     friendId timeuuid,
     status varchar, 
     ts timeuuid,   
     PRIMARY KEY (userId,friendId)
);

CREATE TABLE friends_by_status (
    userId timeuuid,
    friendId timeuuid,
    status varchar, 
    ts timeuuid,   
    PRIMARY KEY ((userId,status), ts)
)with clustering order by (ts desc);

Here, whenever a friend-request is made, I'll insert record in both tables. When I want to check one to one status of users, i'll use this query:

SELECT status FROM friends WHERE userId=xxx AND friendId=xxx;

When I need to query all the records with pending status, i'll use :

SELECT * FROM friends_by_status WHERE userId=xxx AND status='pending';

But, when there is a status change, I can update the 'status' and 'ts' in the 'friends' table, but not in the 'friends_by_status' table as both are part of PRIMARY KEY.

You could see that even if I denormalise it, I definitely need to update 'status' and 'ts' in 'friends_by_status' table to maintain consistency.

Only way I can maintain consistency is to delete the record and insert again.
But frequent delete is also not recommended in cassandra model. As said in Cassaandra Spottify summit.

I find this as the biggest limitation in Cassandra.

Is there any other way to sort this issue.

Any solution is appreciated.

Jim Meyer · Accepted Answer

I don't know how soon you need to deploy this, but in Cassandra 3.0 you could handle this with a materialized view. Your friends table would be the base table, and the friends_by_status would be a view of the base table. Cassandra would take care updating the view when you changed the base table.

For example:

CREATE TABLE friends ( userid int, friendid int, status varchar, ts timeuuid, PRIMARY KEY (userId,friendId) );
CREATE MATERIALIZED VIEW friends_by_status AS
    SELECT userId from friends WHERE userID IS NOT NULL AND friendId IS NOT NULL AND status IS NOT NULL AND ts IS NOT NULL
    PRIMARY KEY ((userId,status), friendID);

INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 500, 'pending', now());
INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 501, 'accepted', now());
INSERT INTO friends (userid, friendid, status, ts) VALUES (1, 502, 'pending', now());
SELECT * FROM friends;                

 userid | friendid | status   | ts
--------+----------+----------+--------------------------------------
      1 |      500 |  pending | a02f7fe0-49f9-11e5-9e3c-ab179e6a6326
      1 |      501 | accepted | a6c80980-49f9-11e5-9e3c-ab179e6a6326
      1 |      502 |  pending | add10830-49f9-11e5-9e3c-ab179e6a6326

So now in the view you can select rows by the status:

SELECT * FROM friends_by_status WHERE userid=1 AND status='pending';

 userid | status  | friendid
--------+---------+----------
      1 | pending |      500
      1 | pending |      502

(2 rows)

And then when you update the status in the base table, it automatically updates in the view:

UPDATE friends SET status='pending' WHERE userid=1 AND friendid=501;
SELECT * FROM friends_by_status WHERE userid=1 AND status='pending';

 userid | status  | friendid
--------+---------+----------
      1 | pending |      500
      1 | pending |      501
      1 | pending |      502

(3 rows)

But note that in the view you couldn't have ts as part of the key, since you can only add one non-key field from the base table as part of the key in the view, which in your case would be adding 'status' to the key.

I think the first beta release for 3.0 is coming out tomorrow if you want to try this out.

Cassandra - alternate way for clustering key with ORDER BY and UPDATE

Answers (2)

Related Questions