NightWolf
NightWolf

Reputation: 7794

Streaming Web Application - Twitter, Facebook, NoSQL or SQL?

So we have a design challenge, we have an absolutely clean slate to develop a system which presents the processing results of various social networking feeds like Twitter & Facebook on the web and via an API service like REST. The processing part has already been completed however we now need somewhere to store the results.

The result format looks something like a message ID, the date of the message, the processed timestamp and then a collection of various processing scores. There will be around 200 million messages in this database. So the first thing we need is something to store this data. We are thinking a NoSQL document database might be interesting to try given that we need to be able to select over a range of dates which discounts column family style databases (as I believe key range scanning in HBase is slow). Or the better option may be to simply store this data in good old MySQL or VoltDB. Does anyone have example use cases or stories on their implementation of such a system?

The next thing will be to develop a web application. We need a charting service which can take data in real-time and update the interface. We are thinking of using HighCharts for this purpose. Is there anything better?

Finally we need some sort of API service which can act like a commet application and stream data, something like Twitter's streaming API. I was thinking the best option for this would be node.js.

So I guess the question is are the technologies we have selected the best for the job, are there any good example use cases out there and is there anything anyone would recommend?

Cheers!

Upvotes: 1

Views: 875

Answers (2)

Prasenjit Mukherjee
Prasenjit Mukherjee

Reputation: 91

You can also use SOLR/Lucene with sharding. Throughput can be increased by having a master/slave solr setup.

Upvotes: 0

Ivan
Ivan

Reputation: 2262

About storage: There are 4 types of nosql storage. key/value, column database, document database and graph database. Each one is slower than the previous one but also gives you more features. In case you need only to store data key/value or column database is your choice. With this type of storage data processing is done by hand and you may need some kind of map reduce implementation. Maybe hadoop. Document and graph databases gives you some kind of query and you can move part of data processing in database (e.g. date filters). If i have to choose some nosql storage I'll make tests with graph database (e.g. neo4j) and If i have performance issues switch to column database (e.g. cassandra) and map reduce

About charts: HighCharts seems good option. I don't know about svg browser support and if there are some performance issues but On my machine looks very nice.

About data streaming. I have little experience only with nodejs and it will be my first choise. There are few other implementations like Tornadoweb for python and Misultin, Mochiweb and Cowboy for erlang. I found a link with benchmark of this servers and it seems erlang servers are faster than nodejs. You can also look at them.

Upvotes: 2

Related Questions