user674052
user674052

Reputation: 106

UPDATE or INSERT & DELETE? Which is better for storage / performance with large text columns?

I have a Postgres DB as part of an application that currently has text fields being populated with about 300K of text.

There are about 3 columns in the table that can routinely be populated with this amount of information. This is causing the tables to be over a gig for about 100 rows. I'd like to delete the data from those columns without deleting the rest of the information in the row.

Is UPDATE to set columns to '' better than INSERT INTO SELECT where I just pick columns I need and then DELETE the existing row? It's a 3rd party application so rewriting the way that they store the data isn't possible at this point.

Upvotes: 2

Views: 3543

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658062

First of all, with 3 columns of ~ 300k text the table should not be over 1 GB in size with just 100 rows. Do you have autovacuum enabled? Or do you run vacuum regularly? Read about vacuum in the manual. Check your settings.

As to the question for UPDATE versus INSERT / DELETE: due to PostgreSQL's MVCC model every UPDATE is effectively a new INSERT. The old row will be kept and eventually purged by (auto)vacuum. Use UPDATE, it's simpler. The key to recovering space / performance is vacuuming.

However, if you have to process all or most rows, and can afford to lock the table exclusively (or there is no concurrent access), it will be faster to rewrite the table completely, either by creating a new one, or like this:

There is also the related command CLUSTER. Rewrites the whole table and all indexes in perfect condition. Beats VACUUM FULL, because it is effectively also a REINDEX. You can try that once to see the real size of the table without dead tuples.

Concerning performance: PostgreSQL writes the (compressed) content of big columns to TOAST tables. So standard queries perform the same, regardless of whether a column holds 30k or 30MB in size. It only affects the time to retrieve the rows found.

As out-of-line (TOASTed) data also gets compressed, your table should not be over 1 GB (including the TOAST table), even though on-disc representation in the database normally occupies 2-3 times the space of the raw data. This normally means, that you are not vacuuming and a lot of dead tuples clutter up your table.

One side effect of TOAST tables is that UPDATE can perform better than INSERT / DELETE because:

During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change.

Shouldn't matter in your case, though, as your plan is to NULL the big columns anyway.

Upvotes: 8

catchdave
catchdave

Reputation: 9327

Either way should work. The UPDATE is simpler, so go with that.

The only benefit I can think of that INSERTing into the same/new table is reduced contention (because you will writing into a new table, or at least a separate row). But that's probably moot with a table of only a 100 rows (even though they are extremely wide rows).

Upvotes: 0

Related Questions