Which database for dealing with very large result-sets?

Question

I am currently working on a PHP application (pre-release).

Background

We have the a table in our MySQL database which is expected to grow extremely large - it would not be unusual for a single user to own 250,000 rows in this table. Each row in the table is given an amount and a date, among other things.

Furthermore, this particular table is read from (and written to) very frequently - on the majority of pages. Given that each row has a date, I'm using GROUP BY date to minimise the size of the result-set given by MySQL - rows contained in the same year can now be seen as just one total.

However, a typical page will still have a result-set between 1000-3000 results. There are also places where many SUM()'s are performed, totalling many tens - if not hundreds - of thousands of rows.

Trying MySQL

On a usual page, MySQL was usually taking around around 600-900ms. Using LIMIT and offsets weren't helping performance and the data has been heavily normalised, and so it doesn't seem like further normalisation would help.

To make matters worse, there are parts of the application which require the retrieval of 10,000-15,000 rows from the database. The results are then used in a calculation by PHP and formatted accordingly. Given this, the performance of MySQL wasn't acceptable.

Trying MongoDB

I have converted the table to MongoDB, and it's speed is faster - it usually takes around 250ms to retrieve 2,000 documents. However, the $group command in the aggregation pipeline - needed to aggregate fields depending on the year they fall in - slows things down. Unfortunately, keeping a total and updating that whenever a document is removed/updated/inserted is also out of the question, because although we can use a yearly total for some parts of the app, in other parts the calculations require that each amount falls on a specific date.

I've also considered Redis, although I think the complexity of the data is beyond what Redis was designed for.

The Final Straw

On top of all of this, speed is important. So performance is up there it terms of priorities.

Questions:

What is the best way to store data which is frequently read/written and rapidly growing, with the knowledge that most queries will retrieve a very large result-set?
Is there another solution to the problem? I'm totally open to suggestions.

I'm a little stuck at the moment, I haven't been able to retrieve such a large result-set in an acceptable amount of time. It seems most datastores are great for small retrieval sizes - even on large amounts of data - but I haven't been able to find anything on retrieving large amounts of data from an even larger table/collection.

Which database for dealing with very large result-sets?

Answers (1)

Related Questions