Reputation: 31
Suppose you have a social network with a billion users. On each user's page, you want to display the number of that users friends, friends of friends, and so on, out to five degrees. Friendships are reciprocal. The counts don't need to update right away, but they should be precise.
I read up on graphs, but I didn't find anything that suggested a scalable approach to this problem. Anything I could think of would take way too much time, way too much space, or both. This is driving me nuts!
Upvotes: 3
Views: 1777
Reputation: 678
It seems to me that the problem really comes down to how we hash/track 1 Billion users as we are counting the friends at each level. (Note that we only need to count them, NOT store them)
If we assume that for each person, their friend and the friends of their friends are of very small order (say <1000 and <100,000) it seems practical to keep these stored in database tables for each user. It only requires two manageable passes of the entire database and then straight-forward additions to the tables when a "new" relationship is created.
If we have 1st and 2nd degree friend stored in a users tables we can leverage those to extend as far as we need to -
EG: to COUNT 3rd degree friend we need to hash and track the 1st degree friends of all the 2nd degree friends. (for 4th degree you do all 2nd's of Seconds, for higher degrees you create the 4th and then extend appropriately to 5th or 6th).
So, at that point (5th and 6th degree friends) you are starting to approach 1 Billion as the number of people that you need to track, hash and count.
I'm thinking that the problem then becomes, what is the most efficient way to has 1 billion record-ID's as you "count" the friends in the higher order relationships.
How you do that, I don't know - any thoughts?
Upvotes: 0
Reputation: 18276
One interesting approach is to translate the friend graph into an adjacency matrix, and then raise the matrix to the 5th power. This gives you an adjacency matrix containing counts of the number of paths-of-length-5 between each node.
Note that you'll want a matrix multiplication algorithm that can take advantage of sparse matrices, since the friends adjacency matrix is likely to be sparse for the first couple levels. Lucky for you, people have a done a lot of work on how to multiply huge matrices (especially sparse ones) efficiently.
Here's a video where Twitter's Oscar Boykin mentions this approach for computing followers of followers at Twitter.
Upvotes: 4