Reputation: 31
help me please, I am new in cassandra world, so i need some advice.
I am trying to make data model for cassandra DB.
In my project i have - users which can follow each other, - articles which can be related with many topics.
Each user can follow many topics.
So the goal is make the aggregated feed where user will get: articles from all topics which he follow + articles from all friends which he follow + self articles.
I have searched about same tasks and found twissandra example project.
As i understood in that example we storing only ids of tweets in timeline, and when we need to get timeline we getting ids of tweets and then getting each tweet by id in separate non blocking request. After collecting all tweets we returning list of tweets to user.
So my question is: is it efficient ?
Making ~41 requests to DB for getting one page of tweets ?
And second question is about followers. When someone creating tweet we getting all of his followers and putting tweet id to their timeline, but what if user have thousands of followers ?
It means that for creating only one tweet we should write (1+followers_count) times to DB ?
Upvotes: 3
Views: 599
Reputation: 16420
twissandra is more a toy example. It will work for some workloads, but you possibly have more you need to partition the data more (break up huge rows).
Essentially though yes, it is fairly efficient - it can be made more so by including the content in the timeline, but depending on requirements that may be a bad idea (if need deleting/editing). The writes should be a non-issue, 20k writes/sec/node is reasonable providing you have adequate systems.
If I understand your use case correctly, you will probably be good with twissandra like schema, but be sure to test it with expected workloads. Keep in mind at a certain scale everything gets a little more complicated (ie if you expect millions of articles you will need further partitioning, see https://academy.datastax.com/demos/getting-started-time-series-data-modeling).
Upvotes: 3