Maximus
Maximus

Reputation: 2976

Innodb Performance Optimization

One of the portion of my site requires bulk insert, it takes around 40 mins for innodb to load that file into database. I have been digging around the web and found few things.

  1. innodb_autoinc_lock_mode=2 (It wont generate consecutive keys)
  2. UNIQUE_CHECKS=0; (disable unique key checks)
  3. FOREIGN_KEY_CHECKS=0 (disable foreign key checks)
  4. --log_bin=OFF turn off binary log used for replication

Problem

I want to set first 3 options for just one session i.e. during bulk insert. The first option does not work mysql says unknown system variable 'innodb_autoinc_lock_mode'. I am using MySQL 5.0.4

The last option, I would like to turn it off but I am wondering what if I need replication later will it just start working if I turn it on again?

Suggestions

Any other suggestions how to improve bulk inserts/updates for innodb engine? Or please comments on my findings.

Thanks

Upvotes: 0

Views: 1376

Answers (2)

MarkR
MarkR

Reputation: 63538

Assuming you are loading the data in a single or few transactions, most of the time is likely to be spent building indexes (depending on the schema of the table).

Do not do a large number of small inserts with autocommit enabled, that will destroy performance with syncs for each commit.

If your table is bigger (or nearly as big as) the innodb buffer pool you are in trouble; a table which can't fit in ram with its indexes cannot be inserted into efficiently, as it will have to do READS to insert. This is so that existing index blocks can be updated.

Remember that disc writes are ok (they are mostly sequential, and you have a battery-backed raid controller, right?), but reads are slow and need to be avoided.

In summary

  • Do the insert in a small number of big-ish transactions, say 10k-100k rows or each. Don't make the transactions too big or you'll exhaust the logs.
  • Get enough ram that your table fits in memory; set the innodb buffer pool appropriately (You are running x86_64, right?)
  • Don't worry about the operation taking a long time, as due to MVCC, your app will be able to operate on the previous versions of the rows assuming it's only reading.

Don't make any of the optimisations listed above, they're probably waste of time (don't take my word for it - benchmark the operation on a test system in your lab with/without those).

Turning unique checks off is actively dangerous as you'll end up with broken data.

Upvotes: 2

Brian Hooper
Brian Hooper

Reputation: 22034

To answer the last part of your question, no it won't just start working again; if the inserts are not replicated but subsequent updates are, the result will not be a pretty sight. Disabling foreign and unique keys should be OK, provided you re-enable them afterwards, and deal with any constraint violations.

How often do you have to do this? Can you load smaller datasets more frequently?

Upvotes: 0

Related Questions