Mr rain
Mr rain

Reputation: 1003

How to get high performance under a large transaction (postgresql)

I have data with amount of 2 millions needed to insert into postgresql. But it has played an low performance. Can I achieve a high-performance inserter by split the large transaction into smaller ones (Actually, I don't want to do this)? or, there is any other wise solutions?

Upvotes: 4

Views: 12562

Answers (3)

Mikko Rantalainen
Mikko Rantalainen

Reputation: 16025

Possible methods to improve performance:

  1. Use the COPY command.
  2. Try to decrease the isolation level for the transaction if your data can deal with the consequences.
  3. Tweak the PostgreSQL server configuration. The default memory limits are very low and will cause disk trashing even with a server having gigabytes of free memory.
  4. Turn off disk barriers (e.g. nobarrier flag for the ext4 file system) and/or fsync on the PostgreSQL server. Warning: this is usually unsafe but will improve your performance a lot.
  5. Drop all the indexes in your table before inserting the data. Some indexes require pretty much work to keep up to date while rows are added. PostgreSQL may be able to create indexes faster in the end instead of continuously updating the indexes in paraller with the insertion process. Unfortunately, there's no simple way to "save" current indexes and update only the missing indexes later.

Splitting the insert job into series of smaller transaction will help only if you have to retry the transaction because of data dependency issues with paraller transactions. If the transaction succeeds on the first try, splitting it into several smaller transactions run in sequence will only decrease your performance.

Upvotes: 7

zambo
zambo

Reputation: 43

In my experience you CAN improve INSERT time-to-completion by splitting a large transaction into smaller ones, but only if the table you are inserting to has NO indexes or constraints applied, and NO default field values that would have to contend for a shared resource under multiple concurrent transactions. In that case, splitting the insert into several distinct parts and submitting each concurrently as separate processes will complete the job in significantly less time.

Upvotes: 2

Szymon Lipiński
Szymon Lipiński

Reputation: 28644

No, the main idea to have it much faster is doing all inserts in one transaction. Multiple transactions, or using no transaction, is much slower.

And try to use copy, which is even faster: http://www.postgresql.org/docs/9.1/static/sql-copy.html

If you really have to use inserts, you can also try dropping all indexes on this table, and creating them after loading the data.

This can be interesting as well: http://www.postgresql.org/docs/9.1/static/populate.html

Upvotes: 5

Related Questions