Reputation: 83
I've been reading up on Cassandra, I've done some tutorials and played around with CQL but now that it is time for me to design a schema I'm having some difficulty.
I'm trying to create a schema that will handle the following use case. I need to keep track of the workers that attend meetings and the topics they discuss in those meetings. So a meeting can have multiple workers attend it, multiple topics are discussed at each meeting and each worker can create multiple topics. These are the data fields:
Worker: Worker ID, Worker Name
Meeting: Meeting ID, Meeting Name, Meeting Time
Topic: Topic ID, Topic Name, Creator
I need queries to see:
So what should the schema look like to handle this? I feel like this shouldn't be that hard but I can't make it make sense when I start making tables.
Upvotes: 3
Views: 308
Reputation: 57798
It's important to remember that Cassandra data modeling is a query-driven exercise. As you have four queries to complete above, you may end-up creating four tables: one for each query needed.
I want you to be able to learn, so I won't do it all for you. But here is how I would solve for queries #1 and #2. For #1, I would create a table like this:
CREATE TABLE meetingAttendance (
meetingID uuid,
meetingName text,
meetingTime timestamp,
workerID uuid,
workerName text,
PRIMARY KEY ((meetingID),workerName));
I'll go with meetingID
as a partition key, and I'll cluster by workerName
so that they come back in order.
For query #2, I'll create a query table like this:
CREATE TABLE meetingsByWorker (
workerID uuid,
workerName text,
meetingID uuid,
meetingName text,
meetingTime timestamp,
topicID uuid,
topicName text,
PRIMARY KEY ((workerID),meetingTime))
WITH CLUSTERING ORDER BY (meetingtime DESC);
As we are querying meetings that a particular worker has attended, I'll partition on workerID
. As meetings are time-based, it makes sense to sort them by meetingTime
. By default they would sort is ASC
ending order, but historical data usually makes sense to look at in DESC
ending order, so I'll define a specific CLUSTERING ORDER and sort direction (DESC
).
After INSERTing some rows into both tables, I can query attendance for a particular meeting like this:
aploetz@cqlsh:stackoverflow2> SELECT * FROM meetingattendance
WHERE meetingid=031e457b-2660-448b-a1d5-68c6cce3a820;
meetingid | workername | meetingname | meetingtime | workerid
--------------------------------------+---------------+--------------------+--------------------------+--------------------------------------
031e457b-2660-448b-a1d5-68c6cce3a820 | David | Project Prometheus | 2093-12-25 08:08:00-0600 | b83cbec4-95e5-4457-b037-c28c51d00418
031e457b-2660-448b-a1d5-68c6cce3a820 | Holloway, Dr. | Project Prometheus | 2093-12-25 08:08:00-0600 | d28b4ee8-b1b9-401a-88d4-bc6b9727d712
031e457b-2660-448b-a1d5-68c6cce3a820 | Janek, Capt. | Project Prometheus | 2093-12-25 08:08:00-0600 | ebccf3ba-c1d2-4503-b717-897c7e89d968
031e457b-2660-448b-a1d5-68c6cce3a820 | Shaw, Dr. | Project Prometheus | 2093-12-25 08:08:00-0600 | c0e3e560-2332-4a46-9fdf-68bdb31abcb2
031e457b-2660-448b-a1d5-68c6cce3a820 | Vickers | Project Prometheus | 2093-12-25 08:08:00-0600 | 77cb9f64-3cb8-43f9-ab0c-b907b01c4404
(5 rows)
aploetz@cqlsh:stackoverflow2> SELECT * FROM meetingattendance
WHERE meetingid=c7cea773-4c99-445f-928d-5b8a511c843b;
meetingid | workername | meetingname | meetingtime | workerid
--------------------------------------+------------+------------------+--------------------------+--------------------------------------
c7cea773-4c99-445f-928d-5b8a511c843b | David | Wake Mr. Weyland | 2093-12-29 13:01:00-0600 | b83cbec4-95e5-4457-b037-c28c51d00418
c7cea773-4c99-445f-928d-5b8a511c843b | Ford, Dr. | Wake Mr. Weyland | 2093-12-29 13:01:00-0600 | 939657c2-e0cb-4a61-87d8-2a1739161d2a
c7cea773-4c99-445f-928d-5b8a511c843b | Vickers | Wake Mr. Weyland | 2093-12-29 13:01:00-0600 | 77cb9f64-3cb8-43f9-ab0c-b907b01c4404
c7cea773-4c99-445f-928d-5b8a511c843b | Weyland | Wake Mr. Weyland | 2093-12-29 13:01:00-0600 | 306955b8-c7ee-4350-8aa4-4c5d64487d74
(4 rows)
Now if I want to see which meetings a particular worker has attended, I can also query for that, by workerID
:
aploetz@cqlsh:stackoverflow2> SELECT workername, meetingtime, meetingid, meetingname
FROM meetingsbyworker WHERE workerid=77cb9f64-3cb8-43f9-ab0c-b907b01c4404;
workername | meetingtime | meetingid | meetingname
------------+--------------------------+--------------------------------------+--------------------
Vickers | 2093-12-29 13:01:00-0600 | c7cea773-4c99-445f-928d-5b8a511c843b | Wake Mr. Weyland
Vickers | 2093-12-26 18:22:00-0600 | 3ea1282b-a465-4626-bd76-c65dd17b9f26 | Head Examination
Vickers | 2093-12-25 08:08:00-0600 | 031e457b-2660-448b-a1d5-68c6cce3a820 | Project Prometheus
(3 rows)
aploetz@cqlsh:stackoverflow2> SELECT workername, meetingtime, meetingid, meetingname
FROM meetingsbyworker WHERE workerid=939657c2-e0cb-4a61-87d8-2a1739161d2a;
workername | meetingtime | meetingid | meetingname
------------+--------------------------+--------------------------------------+------------------
Ford, Dr. | 2093-12-29 13:01:00-0600 | c7cea773-4c99-445f-928d-5b8a511c843b | Wake Mr. Weyland
Ford, Dr. | 2093-12-26 18:22:00-0600 | 3ea1282b-a465-4626-bd76-c65dd17b9f26 | Head Examination
(2 rows)
Note that the data has been denormalized, and some column values appear redundantly. If you decide that you still want entity tables for things like worker, that's ok too. But again, ask yourself how often and how, exactly you plan on querying those tables. The last two should be easy for you to solve by taking a similar approach.
Upvotes: 4