Reputation: 4468
My experience with databases is with fairly small web applications, but now I'm working with a dataset of voter information for an entire state. There are approximately 3.5m voters and I will need to do quite a bit of reporting on them based on their address, voting history, age, etc. The web application itself will be written with Django, so I have a few choices of database including MySQL and PostgreSQL.
In the past I've almost exclusively used MySQL since it was so easily available. I realize that 3.5m rows in a table isn't really all that much, but it's the largest dataset I've personally worked with, so I'm out of my personal comfort zone. Also, this project isn't a quickie throw-away application though, so I want to make sure I choose the best database for the job and not just the one I'm most comfortable with.
If MySQL is an appropriate tool for the job I would also like to know if it makes sense to use InnoDB or MyISAM. I understand the basic differences between the two, but some sources say to use MyISAM for speed but InnoDB if you want a "real" database, while others say all modern uses of MySQL should use InnoDB.
Thanks!
Upvotes: 5
Views: 2545
Reputation: 11581
If you need to compute reports and complex aggregates, be aware that postgres' query optimizer is rather smart and ingenious, wether the mysql "optimizer" is quite simple and dumb.
On a big join the difference can be huge.
The only advantage MySQL has is that it can hit the indexes without hitting the tables.
You should load your dataset in both databases and experiment the biger queries you intend to run. It is better to spend a few days of experimenting, rather than be stuck with the wrong choice.
Upvotes: 0
Reputation: 32346
Since it's a read-heavy table, I will recommend using MyISAM table type. If you do not use foreign keys, you can avoid the bugs like this and that.
Backing up or copying the table to another server is as simple as coping frm, MYI and MYD files.
Upvotes: 1
Reputation: 18146
MyISAM only makes sense if you need speed so badly that you're willing to accept many data integrity issues downsides to achieve it. You can end up with database corruption on any unclean shutdown, there's no foreign keys, no transactions, it's really limited. And since 3.5 million rows on modern hardware is a trivial data set (unless your rows are huge), you're certainly not at the point where you're forced to optimize for performance instead of reliability because there's no other way to hit your performance goals--that's the only situation where you should have to put up with MyISAM.
As for whether to choose PostgreSQL instead, you won't really see a big performance difference between the two on an app this small. If you're familiar with MySQL already, you could certainly justify just using it again to keep your learning curve down.
I don't like MySQL because there are so many ways you can get bad data into the database where PostgreSQL is intolerant of that behavior (see Comparing Speed and Reliability), the bad MyISAM behavior is just a subset of the concerns there. Given how fractured the MySQL community is now and the uncertainties about what Oracle is going to do with it, you might want to consider taking a look at PostgreSQL just so you have some more options here in the future. There's a lot less drama around the always free BSD licensed PostgreSQL lately, and while smaller at least the whole development community for it is pushing in the same direction.
Upvotes: 6
Reputation: 7970
I've run DB's far bigger than this on mysql- you should be fine. Just tune your indexes carefully.
InnoDB supports better locking semantics, so if there will be occasional or frequent writes (or if you want better data integrity), I'd suggest starting there, and then benchmarking myisam later if you can't hit your performance targets.
Upvotes: 7