Reputation: 1010

Need help configuring MQ-like table for Cassandra

I have a table for notifications:

CREATE TABLE notifications (
  id timeuuid,
  created timestamp,
  to_user_id timeuuid,
  status varchar,
  msg text,
  PRIMARY KEY ((to_user_id,status),created)
) WITH CLUSTERING ORDER BY (created DESC);

Configured as shown above, as my 1st task is to pull by the recipient ("to_user_id" field) and the required notification status. Also, I'm puling last N records, that's why clustering order was used. This works fine.

However, I'm having issues with my 2nd task: updating notification statuses. If I'll try to update a "status" column, it will (obviously) throw an error that a Pk part status is found in set.

OK. Thinking that my use case should follow the request, I thought of having 2 tables, 2nd looking like

CREATE TABLE notifications_by_id (
  id timeuuid,
  created timestamp,
  to_user_id timeuuid,
  status varchar,
  msg text,
  PRIMARY KEY (id)
);

But we hit a different issue here - my primary table is the 1st one and to update it i need "to_user_id" and "status", which are not unique and not available during update requests...

Please advise on the "best practice". For some context - you may consider this notifications table as a message queue, without acks/nacks, with messages having attributes, and ability to sort by "to_user_id" and "status" fields.

Thank you! D.

Upvotes: 0

Answers (2)

oharlem

Reputation: 1010

Answering my own question.

I have been always stumbling upon that "status" field as I needed both - to restrict by it and be able to update it, so I decided what if I try to remove one unknown from this my equation.

The new paradigm was to actually mimic a message query system: instead of a "status" column, I've created 3 tables aka "queues": notifications_pending, notifications_ignored, notifications_accepted. And indeed, I do not really need to read a mixed set of notifications. Rather, one status per request only. Maybe status as a column was suggested by my rdbms habits.

OK. With "status" out of my way, we have the following model:

3 tables with notifications_pending serving as a queue, others - more like archives.
My status update actions now are a combination of 2 steps: removal from the "pending" table and insert into required status-specific tables based on the event type (accept or ignore).
All columns, are using clustering order desc

Please share your thoughts!

P.S. Sure, some new questions appear like for ex. tombstones after message "relocation", but that's def. be addressed separately.

Upvotes: 0

Nathan

Reputation: 3200

You need one table with a more precise primary key. Once that is in place you can perform update/insert (each one is an UPSERT in Cassandra) on the primary key for status updates.

The table below has partition keys user_id and status with a GROUPING column created. This particular set up for partitioning data would be more conducive in a table geared towards users, not notifications. Why? It would help answer the question, "What are the notifications of a given status for a given user?" If the notification object could be updated by something outside of a given user's actions, the notification should have the status attribute.

  CREATE TABLE notifications (
  id timeuuid,
  created timestamp,
  to_user_id timeuuid,
  status varchar,
  msg text,
  --OLD LINE PRIMARY KEY ((to_user_id,status),created)
  PRIMARY KEY ((id),created)
) WITH CLUSTERING ORDER BY (created DESC);

A second table for the notification status by user id could be created and the two kept in sync with a BATCH statement. Best practice is to name the tables after the query they server. notifications_by_user would be an appropriate table name for the second query.

As mentioned in the comment section, if you are interested in investigating the Materialized View functionality of Cassandra 3.0, check out this Datastax blog post. Be aware that they should be understood before implemented in production. From the blog post,

Materialized views handle automated server-side denormalization, removing the need for client side handling of this denormalization and ensuring eventual consistency between the base and view data. This denormalization allows for very fast lookups of data in each view using the normal Cassandra read path.

Hopefully this information helps you in your quest to serve data better.

Upvotes: 0

Need help configuring MQ-like table for Cassandra

Answers (2)

Related Questions