Aklin
Aklin

Reputation: 2649

schema design for cassandra

I am working on a project of forum that allows a user to follow questions on certain topics from his network.

A user's news-feed wall comprises of only those questions that have been posted by his connections and tagged on the followed topics. I am confused what database's datamodel would be most fitting for such an application. I have been looking at Cassandra and MySQL solutions as of now.

After my study of Cassandra I realized that Simple news-feed design that shows all the posts from network would be easy to design using Cassandra by executing fast writes to all followers of a user about the post from user. But for my kind of application where there is an additional filter of 'followed topics', I could not convince myself with a good schema design in Cassandra. I hope if I missed something because of my short understanding of cassandra, perhaps, can you please help me out with your suggestions of how this news-feed could be implemented in Cassandra ?

Upvotes: 3

Views: 2582

Answers (1)

Tyler Hobbs
Tyler Hobbs

Reputation: 6932

I'm assuming you've already studied the Twissandra example application. It's very close to what you're describing. Here are a couple of useful links:

The primary difference with your application is the introduction of topics. How you store the data depends on exactly how you want to be able to query it. For example, you might be fine with all topics being presented in the same timeline, or you might want to be able to see a timeline only for a specific topic (like SO tags, for example).

If you don't need separate timelines, I recommend the following, using the Twissandra data model as the base:

Instead of the normal FOLLOWERS column family, maintain one row of followers for every user for each topic. Obviously, this causes a little extra work when creating/altering/dropping users, but it saves you work when new posts are created, which is the bulk of the operations you need to handle.

When a post is made by user Joe on topics A, B, and C, you'll be able to get all of the interested users with a query like:

multiget(FOLLOWERS, ['Joe::A', 'Joe::B', 'Joe::C'])

where 'Joe::A', 'Joe::B', and 'Joe::C' are row keys. For each of the followers that you get back, you can simply add the post's UUID as a column name to each follower's timeline (and you won't have to worry about duplicates in the timeline since you're using the same UUID for the column name).

If you want to be able to support per-topic timelines for each user, I suggest you use one row for each topic that a user is interested in and one row for the all-topics timeline. Since you are already fetching followers by topic, it's easy to know which topic(s) the post has that the followers are interested in, it's to append the post to the correct per-topic timelines.

Upvotes: 4

Related Questions