Trick Jarrett
Trick Jarrett

Reputation: 3577

What DB for big databases?

I'm embarking on a project which will likely cross several million rows in the short future, so I am researching the database I use as that is sure to prove an issue. From what I have read, SQL in all its incarnations has issues once you get to the 2,000,000 rows issue for a table. Is there a good database recommended for these big size projects?

It is a website I am talking about, and archiving old entries is not ideal, though can be done if it proves to be an issue that I can't overcome.

Thanks.

Upvotes: 4

Views: 7255

Answers (14)

Gerco Dries
Gerco Dries

Reputation: 6712

First off, a million records is not exactly a lot when databases are concerned. Any database worth it's salt should be able to handle that just fine.

Create proper indexes on your tables and almost any database will be able to handle those numbers of records. I've seen MySQL databases with millions of rows that worked just fine, and MySQL is not a heavyweight in database land.

MS SQL Server, PostgreSQL, DB2, Progress OpenEdge - almost anything will do if you create proper indexes. Things like MS Access (and possibly sqlite) may fall apart when you put a lot of data in them.

Upvotes: 4

Pete
Pete

Reputation: 12553

As a lot of people have already said, that amount of records is not a problem if your database design is properly done.

But there may be another aspect worth considering. How many users, namely how many simultaneous users, do you expect you application to have? If you expect to get a lot of users, you also need to consider the scalability of the database engine, or database design.

MSSql service may not be expensive for a single server setup, but if you need to scale up, e.g. run on 4 CPUs, the licensing becomes very expensive. And when you have pushed the limit of a single server, and you need to scale out to multiple servers, what do you do? I don't have the answer to that, except that as far as I know, MS SQL Server does not directly support load balancing.

Just a thought

Upvotes: 0

Jon Onstott
Jon Onstott

Reputation: 13727

Keep in mind that if you have a large amount of data:

  • indexing columns you join tables on is ESPECIALLY important
  • writing efficient queries can make a huge difference
  • if you query data all the time and rarely write new rows, you can create clustered indexes and materialized views to retrieve data much more efficiently, based on what queries you use most often

Upvotes: 0

Terence
Terence

Reputation: 347

Properly configured, 2MM rows is not a big deal for most of the commercial DBs and may not be for the Open Source DBs - I don't know enough about MySQL et al to have an opinion.

By SQL I assume the original poster means MS SQL Server. While there were some scaling issues in the 2000 release, they seem to have been mostly addressed in 2005 and 2008. I have one testdb that has significantly more than 2 MM rows running now and running quite well.

Respectfully I think the question is badly stated - you need to describe much more information to get a useful answer. Size of the database, number of tables, number of common joins, will it be optimized for read, write or both, number of concurrent users that will be supported, replication, geographic location of end users vs database server, hardware configuration.

In general I have found SQL Server post 2005 works in a lot of cases very well. If you need the ability to tune everything at the lowest level both Oracle and DB2 give you better access and documentation to do that.

If your need is primarily a data warehouse and you have the cash then I would look at Neteeza or Teradata. I am fan of NZ but we are partners so I am biased.

Hope that helps,

Terence

Upvotes: 0

user158017
user158017

Reputation: 2991

I concur with richardtallent. The big name database systems have all provided us with good tools for large databases. (2 million rows is nothing, although you can see performance issues with lousy indexes or poor choices in the select statements, especially if you are joining across multiple tables of similar size.). It comes down to pros and cons with costs, usability, cost of support, etc.

I can speak most to Oracle and SQL Server. Oracle is pretty pricey, and it takes a pricey, dedicated DBA to really use it right. It isn't known for usability, but a DBA or programmer comfortable with it can work just fine in it. It also has great flexibility and some believe it is more powerful than the others. (I don't know if that's true or not, but I know it certainly provides lots of different ways you can tweak it for efficiency, etc.)

SQL Server can certainly handle large datasets just fine. It has a "prettier" face and tends to be considered more usable, but usability in the end is a matter of opinion. It does have a cheaper price tag, but you might have just a bit less flexibility than Oracle. You can get a "cheap" SQL Server dba, because its user-friendly interface makes it easy for people to do some of the basic DBA tasks without being experts. But you get what you pay for (usually) and if you really want efficiency and security, you pay for an expert anyway.

Those are just a few of the things to consider when looking at DBs. I'm sure MySQL and DB2 have their own pros and cons to be weighed.

But none of them have a problem with a measly 2 million rows. (I regularly work in a database with hundreds of tables, some of which have over 50 million rows, and I see little performance hit b/c the DBAs know what they are doing.)

FOLLOW UP EDIT: Since this is for a website, maybe your biggest consideration should be integration of front/back. For example, if you are using ASP for the web, SQL Server is a natural choice.

Upvotes: 1

rmeador
rmeador

Reputation: 25694

As others have said, any decent DB can handle that sort of load. I've used MS SQL Server and PostgreSQL for databases of that size before, both work great. I'd recommend PostgreSQL because it's free and open. I've never done a performance comparison, but it seems to be very capable. I'd avoid DB2 or Oracle because they're very hard to use (unless you want to pay for a full time DBA, in which case such a person might be able to squeeze better performance out of those than any other solution, especially with Oracle).

Upvotes: 1

richardtallent
richardtallent

Reputation: 35374

Microsoft SQL Server, MySQL, Oracle, and DB2 can all handle millions and millions of rows without a problem.

The problem will be finding a DBA who knows how to design and manage it properly so you get the performance characteristics you're looking for.

Upvotes: 3

MatthewMartin
MatthewMartin

Reputation: 33143

Try looking at other large organization to see what they're using. MS's proof of concept for very large databases is TerraServer, which is running a database that is several terabytes in size.

Any database will have problems with a small dataset if you are doing table scans, cartesian products, expensive calculations for each row, etc.

To really stress a relational db with a table of 2 million rows, you'd have to be doing cross tabs while doing a large number of inserts and updates and then you'd want to switch to an olap datastore.

Do you have anything else to describe the expected workload? Is this mostly readonly, read write, etc?

Upvotes: 0

Mark
Mark

Reputation: 11740

We run lots of databases with row counts in the hundreds of millions in MSSQL (2000, 2005, 2008). Your row count isn't where your problem will arise, it's in the characteristics of access to the data. Depending on how it looks, you may need to scale across separate hardware, and that is where the differences between database servers will really show up (that and price...)

Upvotes: 3

jere
jere

Reputation:

2,000,000 rows is really not much at all. I've seen plenty of tables with > 50 million rows with acceptable performance, in MS SQL.

IMHO you're still pretty far away from being a 'big database'

Upvotes: 1

leeeroy
leeeroy

Reputation: 11446

No database that would call themselves an SQL database if they had issues with 2 million records. You can get in trouble with some databases with 2 billion records though.

I've had mysql databases with well over 150 million records without trouble. You need to figure out what features you need from a database before you're deciding, not ponder over a few million rows - which is not much at all.

Upvotes: 6

RHicke
RHicke

Reputation: 3604

For most apps MS SQL will work fine. MySQL will work for smaller apps, but to answer your question if you are truly concerned about DB performance I would go with Oracle if you can afford it but if you are like the most of us who can't use an $80,000 database I would suggest MS SQL It works well. By the sounds of what you are doing (website) I would use MS SQL and utilize caching. Using the database correctly tends to be more important than using the correct database.

Upvotes: 0

BoltBait
BoltBait

Reputation: 11489

One of the tables in my current project has 13 million rows in it. MS SQL Server handles it just fine. Really, 2 million rows is nothing.

But, seriously, if you want a high-end database, look to Oracle, Teradata, and DB2.

Upvotes: 3

Thorarin
Thorarin

Reputation: 48486

I've had tables in MS SQL Server with a fair bit more than 2 million rows without trouble. Of course, it depends on how you're using that data.

Just don't try using MySQL for something like this. At least from my experience, it just doesn't allow enough tweaking to provide high enough performance. I've run into a few cases with large amounts of data in (almost) identically set up tables. MySQL5 performed like 30 times slower than SQL Server on the same hardware. Extreme example maybe, but still.

I have too little experience with PostgreSQL or Oracle to judge, so I will just stick with not recommending MySQL. Or Access ;)

Upvotes: 3

Related Questions