Reputation: 620
I'm designing a keyspace in Cassandra that will hold information about groups of users. Some info on it:
I have two designs that I'm considering for this.
select * from table where GroupID = {GroupID}
and would return as many rows as there are users in the group.select * from table where GroupID = {GroupID}
and would return a single row with the set of user ids contained in its UserIDs column set.I can't find a lot of documentation surrounding what would be the better design for this scenario. Any thoughts or pros and cons to either scenario?
Upvotes: 4
Views: 1753
Reputation: 2224
From Datastax:
Looks like in your case, using collections won't hit the allowable limits. Each item UserID
is probably <=64K, and you have <= 20K entries, which is less than the 64K maximum.
However, as Jeff said, your performance will likely be much better using the compound key, partitioning by GroupID
(your query column) and clustering by UserID
(the values you are receiving from your query).
Upvotes: 3
Reputation: 4426
For a group of 20k user IDs, I would absolutely avoid using collections at all costs. Collections are a convenience feature, but they're not nearly as performant as using a traditional CQL data model where you have the PRIMARY KEY(GroupID,UserID)
where all users are ordered in a single partition. That will be both easy to reason about, easy to query (can SELECT
either a single partition and page through all group members, or you can SELECT ... WHERE GroupID=X and UserID=Y
to determine if a user is in the group), and very performant.
Upvotes: 6