Reputation: 1578
This week I read an interesting article which explain how the authors implemented an activity. Basically, they're using two approaches to handle activities, which I'm adapting to my scenario, so supposing we hava an user foo who has a certain number (x) of followers:
if x<500, then the activity will be copyied to every follower feed
this means slow writes, fast reads
if x>500, only a link will be made between foo and his followoers
in theory, fast writes, but will slow reads
So when some user access your activity feed, the server will fetch and merge all data, so this means fast lookups in their own copyied activities and then query accross the links. If a timeline has a limit of 20, then I fetch 10 of each and then merge.
I'm trying to do it with Riak and the feature of Linking, so this is my question: is linking faster than copy? My idea of architecture is good enough? Are there other solutions and/or technologies which I should see?
PS.: I'm not implementing a activity feed for production, it's just for learning how to implement one which performs well and use Riak a bit.
Upvotes: 1
Views: 302
Reputation: 1115
Two thoughts.
1) No, Linking (in the sense of Riak Link Walking) is very likely not the right way to implement this. For one, each link is stored as a separate HTTP header, and there is a recommended limit in the HTTP spec on how many header fields you should send. (Although, to be fair, in tests you can use upwards of a 1000 links in the header with Riak, seems to work fine. But not recommended). More importantly, querying those links via the Link Walking api actually uses MapReduce on the backend, and is fairly slow for the kind of usage you're intending it for.
This is not to say that you can't store JSON objects that are lists of links, sure, that's a valid approach. I'm just recommending against using Riak links for this.
2) As for how to properly implement it, that's a harder question, and depends on your traffic and use case. But your general approach is valid -- copy the feed for some X value of updates (whether X is 500 or much smaller should be determined in testing), and link when the number of updates is greater than X.
How should you link? You have 3 choices, all with tradeoffs. 1) Use Secondary Indices (2i), 2) Use Search, or 3) Use links "manually", meaning, store JSON documents with URLs that you dereference manually (versus using link walking queries).
I highly recommend watching this video: http://vimeo.com/album/2258285/page:2/sort:preset/format:thumbnail (Building a Social Application on Riak), by the Clipboard engineers, to see how they solved this problem. (They used Search for linking, basically).
Upvotes: 2