How do I model multiple "many to many" relationships in Cassandra?

Question

I've been reading up on Cassandra, I've done some tutorials and played around with CQL but now that it is time for me to design a schema I'm having some difficulty.

I'm trying to create a schema that will handle the following use case. I need to keep track of the workers that attend meetings and the topics they discuss in those meetings. So a meeting can have multiple workers attend it, multiple topics are discussed at each meeting and each worker can create multiple topics. These are the data fields:

Worker: Worker ID, Worker Name

Meeting: Meeting ID, Meeting Name, Meeting Time

Topic: Topic ID, Topic Name, Creator

I need queries to see:

who is attending a meeting?
what meetings a worker has attended in the past?
what topics a worker has created?
what meetings have discussed a particular topic?

So what should the schema look like to handle this? I feel like this shouldn't be that hard but I can't make it make sense when I start making tables.

Aaron · Accepted Answer

It's important to remember that Cassandra data modeling is a query-driven exercise. As you have four queries to complete above, you may end-up creating four tables: one for each query needed.

I want you to be able to learn, so I won't do it all for you. But here is how I would solve for queries #1 and #2. For #1, I would create a table like this:

CREATE TABLE meetingAttendance (
  meetingID uuid,
  meetingName text,
  meetingTime timestamp,
  workerID uuid,
  workerName text,
  PRIMARY KEY ((meetingID),workerName));

I'll go with meetingID as a partition key, and I'll cluster by workerName so that they come back in order.

For query #2, I'll create a query table like this:

CREATE TABLE meetingsByWorker (
  workerID uuid,
  workerName text,
  meetingID uuid,
  meetingName text,
  meetingTime timestamp,
  topicID uuid,
  topicName text,
  PRIMARY KEY ((workerID),meetingTime))
WITH CLUSTERING ORDER BY (meetingtime DESC);

As we are querying meetings that a particular worker has attended, I'll partition on workerID. As meetings are time-based, it makes sense to sort them by meetingTime. By default they would sort is ASCending order, but historical data usually makes sense to look at in DESCending order, so I'll define a specific CLUSTERING ORDER and sort direction (DESC).

After INSERTing some rows into both tables, I can query attendance for a particular meeting like this:

aploetz@cqlsh:stackoverflow2> SELECT * FROM meetingattendance 
    WHERE meetingid=031e457b-2660-448b-a1d5-68c6cce3a820;

 meetingid                            | workername    | meetingname        | meetingtime              | workerid
--------------------------------------+---------------+--------------------+--------------------------+--------------------------------------
 031e457b-2660-448b-a1d5-68c6cce3a820 |         David | Project Prometheus | 2093-12-25 08:08:00-0600 | b83cbec4-95e5-4457-b037-c28c51d00418
 031e457b-2660-448b-a1d5-68c6cce3a820 | Holloway, Dr. | Project Prometheus | 2093-12-25 08:08:00-0600 | d28b4ee8-b1b9-401a-88d4-bc6b9727d712
 031e457b-2660-448b-a1d5-68c6cce3a820 |  Janek, Capt. | Project Prometheus | 2093-12-25 08:08:00-0600 | ebccf3ba-c1d2-4503-b717-897c7e89d968
 031e457b-2660-448b-a1d5-68c6cce3a820 |     Shaw, Dr. | Project Prometheus | 2093-12-25 08:08:00-0600 | c0e3e560-2332-4a46-9fdf-68bdb31abcb2
 031e457b-2660-448b-a1d5-68c6cce3a820 |       Vickers | Project Prometheus | 2093-12-25 08:08:00-0600 | 77cb9f64-3cb8-43f9-ab0c-b907b01c4404

(5 rows)
aploetz@cqlsh:stackoverflow2> SELECT * FROM meetingattendance
    WHERE meetingid=c7cea773-4c99-445f-928d-5b8a511c843b;

 meetingid                            | workername | meetingname      | meetingtime              | workerid
--------------------------------------+------------+------------------+--------------------------+--------------------------------------
 c7cea773-4c99-445f-928d-5b8a511c843b |      David | Wake Mr. Weyland | 2093-12-29 13:01:00-0600 | b83cbec4-95e5-4457-b037-c28c51d00418
 c7cea773-4c99-445f-928d-5b8a511c843b |  Ford, Dr. | Wake Mr. Weyland | 2093-12-29 13:01:00-0600 | 939657c2-e0cb-4a61-87d8-2a1739161d2a
 c7cea773-4c99-445f-928d-5b8a511c843b |    Vickers | Wake Mr. Weyland | 2093-12-29 13:01:00-0600 | 77cb9f64-3cb8-43f9-ab0c-b907b01c4404
 c7cea773-4c99-445f-928d-5b8a511c843b |    Weyland | Wake Mr. Weyland | 2093-12-29 13:01:00-0600 | 306955b8-c7ee-4350-8aa4-4c5d64487d74

(4 rows)

Now if I want to see which meetings a particular worker has attended, I can also query for that, by workerID:

aploetz@cqlsh:stackoverflow2> SELECT workername, meetingtime, meetingid, meetingname
    FROM meetingsbyworker WHERE workerid=77cb9f64-3cb8-43f9-ab0c-b907b01c4404;

 workername | meetingtime              | meetingid                            | meetingname
------------+--------------------------+--------------------------------------+--------------------
    Vickers | 2093-12-29 13:01:00-0600 | c7cea773-4c99-445f-928d-5b8a511c843b |   Wake Mr. Weyland
    Vickers | 2093-12-26 18:22:00-0600 | 3ea1282b-a465-4626-bd76-c65dd17b9f26 |   Head Examination
    Vickers | 2093-12-25 08:08:00-0600 | 031e457b-2660-448b-a1d5-68c6cce3a820 | Project Prometheus

(3 rows)
aploetz@cqlsh:stackoverflow2> SELECT workername, meetingtime, meetingid, meetingname
    FROM meetingsbyworker WHERE workerid=939657c2-e0cb-4a61-87d8-2a1739161d2a;

 workername | meetingtime              | meetingid                            | meetingname
------------+--------------------------+--------------------------------------+------------------
  Ford, Dr. | 2093-12-29 13:01:00-0600 | c7cea773-4c99-445f-928d-5b8a511c843b | Wake Mr. Weyland
  Ford, Dr. | 2093-12-26 18:22:00-0600 | 3ea1282b-a465-4626-bd76-c65dd17b9f26 | Head Examination

(2 rows)

Note that the data has been denormalized, and some column values appear redundantly. If you decide that you still want entity tables for things like worker, that's ok too. But again, ask yourself how often and how, exactly you plan on querying those tables. The last two should be easy for you to solve by taking a similar approach.

How do I model multiple "many to many" relationships in Cassandra?

Answers (1)

Related Questions

How do I model multiple &quot;many to many&quot; relationships in Cassandra?

Answers (1)

Related Questions

How do I model multiple "many to many" relationships in Cassandra?