Mark
Mark

Reputation: 10964

Best database engine for huge datasets

I do datamining and my work involves loading and unloading +1GB database dump files into MySQL. I am wondering is there any other free database engine that works better than MySQL on huge databases? is PostgreSQL better in terms of performance?

I only use basic SQL commands so speed is the only factor for me to choose a database

Upvotes: 4

Views: 6749

Answers (7)

Adamski
Adamski

Reputation: 54705

It is unlikely that substituting a different database engine will provide a huge increase in performance. The slow down you mention is more likely to be related to your schema design and data access patterns. Maybe you could provide some more information about that? For example, is the data stored as a time series? Are records written once sequentially or inserted / updated / deleted arbitrarily?

Upvotes: 4

Joseph Turian
Joseph Turian

Reputation: 16540

If you are doing datamining, perhaps you could use a document-oriented database. These are faster than relational databases if you do not use my SQL.

MongoDB and CouchDB are both good options. I prefer MongoDB because I don't know Java, and found CouchDB easier to get up and running.

Here are some articles on the topic:

Upvotes: 1

Greg Smith
Greg Smith

Reputation: 18136

Your question is too ambiguous to answer usefully. "Performance" means many different things to different people. I can comment on how MySQL and PostgreSQL compare in a few areas that might be important, but without information it's hard to say which of these actually matter to you. I've written up a bunch more background information on this topic at Why PostgreSQL Instead of MySQL: Comparing Reliability and Speed. Which is faster certainly depends on what you're doing.

Is the problem that loading data into the database is too slow? That's one area that PostgreSQL doesn't do particularly well at, the COPY command in Postgres is not a particularly fastest bulk-loading mechanism.

Is the problem that queries run too slowly? Is so, how complicated are they? On complicated queries, the PostgreSQL optimizer can do a better job than the one in SQL, particularly if there are many table joins involved. Small, simple queries tend to run faster in MySQL because it isn't doing as much thinking about how to execute the query before beginning; smarter execution costs a bit of overhead.

How many clients are involved? MySQL can do a good job with a small number of clients, at higher client counts the locking mechanism in PostgreSQL might do a better job.

Do you care about transactional integrity? If not, it's easier to turn more of those features off in MySQL, which gives it a significant speed advantage compared to PostgreSQL.

Upvotes: 1

ConcernedOfTunbridgeWells
ConcernedOfTunbridgeWells

Reputation: 66612

If your datamining tool will support it, consider working from flat file sources. This should save most of your import/export operations. It does have some caveats, though:

  • You may need to get proficient with a scripting language like Perl or Python to do the data munging (assuming you're not already familiar with one).

  • You may need to expand the memory on your computer or go to a 64-bit platform if you need more memory.

  • Your data mining tool might not support working from flat data files in this manner, in which case you're buggered.

Modern disks - even SATA ones - will pull 100MB/sec or so off the disk in sequential reads. This means that something could inhale a 1GB file fairly quickly.

Alternatively, you could try getting SSDs on your machine and see if that improves the performance of your DBMS.

Upvotes: 2

M. Utku ALTINKAYA
M. Utku ALTINKAYA

Reputation: 2272

As long as you drop indexes before inserting huge data, should not be much difference between those two.

Upvotes: 3

akrasikov
akrasikov

Reputation: 36

I am using PostgreSQL with my current project and also have to dump/restore databases pretty often. It takes less than 20 mins to restore 400Mb compressed dump. You may give it a try, although some server configuration parameters need to be tweaked to comply with your hardware configuration. These parameters include, but not limited to:

  • shared_buffers
  • work_mem
  • temp_buffers
  • maintenance_work_mem
  • commit_delay
  • effective_cache_size

Upvotes: 1

Joonas Pulakka
Joonas Pulakka

Reputation: 36577

HDF is the storage choice of NASA's Earth Observing System, for instance. It's not exactly a database in the traditional sense, and it has its own quirks, but in terms of pure performance it's hard to beat.

Upvotes: 2

Related Questions