user1236552
user1236552

Reputation: 33

Best database for multi million row store/query

We have a database that has been growing for around 5 years. The main table has near 100 columns and 700 million rows (and growing).

The common use case is to count how many rows match a given criteria, that is: select count(*) where column1='TypeA' and column2='BlockC'.

The other use case is to retrieve the rows that match a criteria.

The queries started by taking a bit of time, now they take a couple of minutes. I want to find some DBMS that allows me to make the two use cases as fast as possible.

I've been looking into some Column store databases and Apache Cassandra but still have no idea what is the best option. Any ideas?

Upvotes: 1

Views: 1093

Answers (1)

Neil McGuigan
Neil McGuigan

Reputation: 48287

Update: these days I'd recommend Hive 3 or PrestoDB for big data analysis

I am going to assume this is an analytic (historical) database with no current data. If not, you should consider separating your dbs.

You are going to want a few features to help speed up analysis:

  • Materialized views. This is essentially pre-calculating values, and then storing the results for later analysis. MySQL and Postgres (coming soon in Postgres 9.3) do not support this, but you can mimic with triggers.

  • easy OLAP analysis. You could use Mondrian OLAP server (java), but then Excel doesn't talk to it easily, but JasperSoft and Pentaho do.

  • you might want to change the schema for easier OLAP analysis, ie the star schema. Good book:

http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247/ref=pd_sim_b_1

If you want open source, I'd go Postgres (doesn't choke on big queries like mysql can), plus Mondrian, plus Pentaho.

If not open source, then best bang for buck is likely Microsoft SQL Server with Analysis Services.

Upvotes: 1

Related Questions