Column Copy and Update vs. Column Create and Insert

I have a table with 32 Million rows and 31 columns in PostgreSQL 9.2.10. I am altering the table by adding columns with updated values.

For example, if the initial table is:

id     initial_color
--     -------------
1      blue
2      red
3      yellow

I am modifying the table so that the result is:

id     initial_color     modified_color
--     -------------     --------------
1      blue              blue_green
2      red               red_orange
3      yellow            yellow_brown

I have code that will read the initial_color column and update the value.

Given that my table has 32 million rows and that I have to apply this procedure on five of the 31 columns, what is the most efficient way to do this? My present choices are:

Copy the column and update the rows in the new column
Create an empty column and insert new values

I could do either option with one column at a time or with all five at once. The columns types are either character varying or character.

Upvotes: 2

Answers (3)

Erwin Brandstetter

Reputation: 658052

The columns types are either character varying or character.

Don't use character, that's a misunderstanding. varchar is ok, but I would suggest just text for arbitrary character data.

Any downsides of using data type "text" for storing strings?

Given that my table has 32 million rows and that I have to apply this procedure on five of the 31 columns, what is the most efficient way to do this?

If you don't have objects (views, foreign keys, functions) depending on the existing table, the most efficient way is create a new table. Something like this ( details depend on the details of your installation):

BEGIN;
LOCK TABLE tbl_org IN SHARE MODE;  -- to prevent concurrent writes

CREATE TABLE tbl_new (LIKE tbl_org INCLUDING STORAGE INCLUDING COMMENTS);

ALTER tbl_new ADD COLUMN modified_color text
            , ADD COLUMN modified_something text;
            -- , etc
INSERT INTO tbl_new (<all columns in order here>)
SELECT <all columns in order here>
    ,  myfunction(initial_color) AS modified_color  -- etc
FROM   tbl_org;
-- ORDER  BY tbl_id;  -- optionally order rows while being at it.

-- Add constraints and indexes like in the original table here

DROP tbl_org;
ALTER tbl_new RENAME TO tbl_org;
COMMIT;

If you have depending objects, you need to do more.

Either was, be sure to add all five at once. If you update each in a separate query you write another row version each time due to the MVCC model of Postgres.

Related cases with more details, links and explanation:

While creating a new table you might also order columns in an optimized fashion:

Calculating and saving space in PostgreSQL

Upvotes: 3

max

Reputation: 20

create new column (modified colour), it will have a value of NULL or blank on all records,

run an update statement, assuming your table name is 'Table'.

update table
set modified_color = 'blue_green'
where initial_color = 'blue'

if I am correct this can also work like this

update table set modified_color = 'blue_green' where initial_color = 'blue';
update table set modified_color = 'red_orange' where initial_color = 'red';
update table set modified_color = 'yellow_brown' where initial_color = 'yellow';

once you have done this you can do another update (assuming you have another column that I will call modified_color1)

update table set 'modified_color1'= 'modified_color'

Upvotes: 0

rpy

Reputation: 4023

Maybe I'm misreading the question, but as far as I know, you have 2 possibilities for creating a table with the extra columns:

CREATE TABLE
This would create a new table and filling could be done using
- CREATE TABLE .. AS SELECT.. for filling with creation or
- using a separate INSERT...SELECT... later on Both variants are not what you seem to want to do, as you stated solution without listing all the fields.
  Also this would require all data (plus the new fields) to be copied.
ALTER TABLE...ADD ...
This creates the new columns. As I'm not aware of any possibility to reference existing column values, you will need an additional UPDATE ..SET... for filling in values.

So, I' not seeing any way to realize a procedure that follows your choice 1.

Nevertheless, copying the (column) data just to overwrite them in a second step would be suboptimal in any case. Altering a table adding new columns is doing minimal I/O. From this, even if there would be a possibility to execute your choice 1, following choice 2 promises better performance by factors.

Thus, do 2 statements one ALTER TABLE adding all your new columns in on go and then an UPDATE providing the new values for these columns will achieve what you want.

Upvotes: 2

Column Copy and Update vs. Column Create and Insert

Answers (3)

Related Questions