Reputation: 1262

Backend technology for high volume data for web application

I am developing an application to provide daily dynamic information like prices, availability, etc for around 50,000 objects. I need to store data for about the next 200 days. That would mean a total of 10 million rows. The prices will be batch updated and new data will be added once daily. Let me say about 10,000 existing rows get updated and 50,000 rows are inserted daily. What is the best backend framework that I can use.

Can MySQL be scalable with limited hardware capability. Or is NoSQL database the way to go? If yes, then which NoSQL database will be best suited for fast fetching and updating the data.

Upvotes: 1

Answers (2)

John Powell

Reputation: 12581

This type of volume is well within the capabilities/capacities of traditional RDBMS. I would say that if you are familiar with MySQL you will be safe to stick with it. A lot depends also, on what kind of queries you want to run. With a properly structured, denormalized setup, you can run ad hoc queries in an RDBMS, whereas with document stores, you need to think quite carefully about structure up front -- embedding versus referencing, see: MongoDB relationships: embed or reference?. MongoDB has added a very nice aggregation framework, which goes a long way towards being able to query data as you would in an RDBMS, but in many other NoSQL systems, queries are essentially map-reduce jobs and joins are either painful or impossible.

It sounds like your data is structured around dates/days. One thing you can do that will yield dramatic speed improvements on queries is partitioning by date ranges. I have worked on dbs over 100m rows in MySQL where historical data had to be kept for auditing purposes but where most of the read/write was on current data, and partitioning led to truly dramatic read query improvements.

You might be interested by this link which shows what some very high volume sites are using: What databases do the World Wide Web's biggest sites run on? Anecdotally, I know that Facebook had trillions of rows in MySQL across various clusters before they started hitting real bottlenecks, but it is no suprise that Cassandra ultimately came out of Facebook engineering, given the truly colossal data volumes they now handle.

Cassandra, Riak, CouchDB, MongoDB, etc all arose to solve very real problems, but these come with tradeoffs, both in terms of the CAP theorem, and in terms of ad hoc queries being more difficult than in RDBMS. Having said that, MongoDB and Cassandra (which I have most experience with) are easy to set up and fun to work with, so if you want to give them a go, I'm sure you will have no problems, but I would say your usage requirements are well within the capabilities of MySQL. Just my 2c.

Upvotes: 2

kkmishra

Reputation: 709

I would recommend you to use Cassandra, as you need to write more than read, and Cassandra is optimized for high throughput while write. It provide scalability, no single point failure and high throughput. And you can update records as well.
Cassandra also supports batch operation for DML (data manipulation language) i.e. write, update and delete. And batch operation of Cassandra provides atomicity as well.

Upvotes: 2

Backend technology for high volume data for web application

Answers (2)

Related Questions