philhan
philhan

Reputation: 620

Cassandra schema use set collection or multiple rows

I'm designing a keyspace in Cassandra that will hold information about groups of users. Some info on it:

I have two designs that I'm considering for this.

  1. Multiple rows per group: The table would have two columns of type TEXT and be keyed on Primary Key (GroupID, UserID) and reading the users in a group would be done by select * from table where GroupID = {GroupID} and would return as many rows as there are users in the group.
  2. One row per group using the Cassandra Set Collection: The table would have two columns, the first (GroupID) of type TEXT and the second (UserIDs) of type SET[TEXT] and be keyed on Pimary Key (GroupID). Reading the users in a gorup would be done by select * from table where GroupID = {GroupID} and would return a single row with the set of user ids contained in its UserIDs column set.

I can't find a lot of documentation surrounding what would be the better design for this scenario. Any thoughts or pros and cons to either scenario?

Upvotes: 4

Views: 1753

Answers (2)

Alfredo Gimenez
Alfredo Gimenez

Reputation: 2224

From Datastax:

When to use collections

Using collections

Looks like in your case, using collections won't hit the allowable limits. Each item UserID is probably <=64K, and you have <= 20K entries, which is less than the 64K maximum.

However, as Jeff said, your performance will likely be much better using the compound key, partitioning by GroupID (your query column) and clustering by UserID (the values you are receiving from your query).

Upvotes: 3

Jeff Jirsa
Jeff Jirsa

Reputation: 4426

For a group of 20k user IDs, I would absolutely avoid using collections at all costs. Collections are a convenience feature, but they're not nearly as performant as using a traditional CQL data model where you have the PRIMARY KEY(GroupID,UserID) where all users are ordered in a single partition. That will be both easy to reason about, easy to query (can SELECT either a single partition and page through all group members, or you can SELECT ... WHERE GroupID=X and UserID=Y to determine if a user is in the group), and very performant.

Upvotes: 6

Related Questions