user1978317
user1978317

Reputation: 343

Pulling wall/dashboard data like facebook, twitter, tumblr, etc

I feel this must be asked elsewhere, but I couldn't figure out the correct search words to find an answer. If this is a duplicate, please point to correct response elsewhere.

Services like Facebook, Twitter, Tumblr, and I'm sure a whole host of others allow you to follow other users. Their posts then appear on a wall or dashboard. I'm wondering how, with such large data sets, these services can pull posts so quickly. I assume they are not using a SQL server and they are not doing something like:

SELECT * FROM `posts` WHERE `poster_id` IN ( super long list of users being followed ) ORDER BY `date` LIMIT 10;

As the above could have a very large list of user ids in it, and it likewise wouldn't work very well with sharding, which all these large services use.

So, can anyone describe what kind of queries, algorithms, or databases these services use to display the followed posts?

Edit: Thanks for everyone's responses. It seems like the most likely way of doing this is via a graph database such as GraphDB, Neo4j or FlockDb, the latter of which is Twitter's graph database. With Neo4j, it is done something like what is documented at http://docs.neo4j.org/chunked/milestone/cypher-cookbook-newsfeed.html.

Of course, Google, Facebook, etc., all have their own, internally built or internally modified databases for their unique use cases.

Upvotes: 0

Views: 346

Answers (4)

rav
rav

Reputation: 753

I could name a few technique on how to make process/fetch data faster but I'm not sure these are the same techniques implemented by facebook, twitter..etc..as each one of them is built on different platform and architecture.

  1. Fetching the data from cached memory - means that users will fetch data without touching the DB, rather getting it from the memory
  2. Splitting the process into different servers - means that the resources are processed by multiple servers to prevent bottlenecks..

if you want to specifically know the stack facebook uses you could read the link. http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/

Upvotes: 2

user1978317
user1978317

Reputation: 343

Thanks for everyone's responses. It seems like the most likely way of doing this is via a graph database such as GraphDB, Neo4j or FlockDb, the latter of which is Twitter's graph database. With Neo4j, it is done something like what is documented at http://docs.neo4j.org/chunked/milestone/cypher-cookbook-newsfeed.html.

Of course, Google, Facebook, etc., all have their own, internally built or internally modified databases for their unique use cases.

Upvotes: 0

Tim B
Tim B

Reputation: 41208

Essentially all the really big sites have moved away from SQL servers and towards NoSQL in some form or other (several of the really big ones having written their own!). The NoSQL databases relax ACID constraints but as a result are much more able to scale and handle potentially enormous numbers of requests.

If you google NoSQL you will find lots of information about it.

http://blog.3pillarglobal.com/exploring-different-types-nosql-databases

http://www.mongodb.com/learn/nosql

SQL still has it's place, but for a lot of things NoSQL is the way forwards.

Upvotes: 1

Anna Billstrom
Anna Billstrom

Reputation: 2492

Check out Open Graph- Twitter & Facebook both use this architecture to retrieve "stories" posted by users. It's a version of the semantic web idea. https://developers.facebook.com/docs/opengraph/ The days of SQL calls are over (thank god). FQL- the Facebook Query Language still works, but is largely being deprecated. It's not SQL but a version of a query language against the graph (was databases).

Upvotes: 1

Related Questions