ehacinom
ehacinom

Reputation: 8914

How to update redshift column: simple text replacement

I have a large target table with columns (id, value). I want to update value='old' to value='new'.

The simplest way would be to UPDATE target SET value='new' WHERE value='old';

However, this deletes and creates new rows and is not recommended, possibly. So I tried to do a merge column update:

# staging
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage (SELECT id, value FROM target WHERE value=`old`);
UPDATE stage SET value='new' WHERE value='old'; # ??? how do you update value?

# merge
begin transaction;
UPDATE target 
SET value = stage.value FROM stage
WHERE target.id = stage.id and target.distkey = stage.distkey; # collocated join?
end transaction;

DROP TABLE stage;
  1. This can't be the best way of creating the table stage: I have to do all these UPDATE delete/writes when I update this way. Is there a way to do it in the INSERT?
  2. Is it necessary to force the collocated join when I use CREATE TABLE LIKE?

Upvotes: 1

Views: 2787

Answers (2)

ehacinom
ehacinom

Reputation: 8914

From a redshift developer:

This case doesn't require an upsert, or update+insert, and it is fine to just run the update:

UPDATE target SET value='new' WHERE value='old';

Another way would be to INSERT the rows you need and DELETE the other rows, but that's unnecessarily complicated.

Upvotes: 2

demircioglu
demircioglu

Reputation: 3465

Are you updating all the rows in the table?

If yes you can use CTAS (create table as) which is recommended method

Assuming you table looks like this

table1
id, col1,col2, value

You can use the following SQL to create a new table

CREATE TABLE tmp_table AS 
  SELECT id, col1,col2, 'new_value' 
  FROM table1;

After you verify data in tmp_table

DROP TABLE table1;
ALTER TABLE tmp_table RENAME TO table1;

If you are not updating all the rows you can use a filter to do a CTAS and insert the rest of the rows to the new table, let me know if you need more info if this is the case

CREATE TABLE tmp_table AS 
  SELECT id, col1,col2, 'new_value' 
  FROM table1
  WHERE value = 'old'

INSERT INTO tmp_table SELECT * from table1;

Next step would be DROP the tmp table and rename table1

Update: Based on your comment you can do the following, let me know if this solves your case.

This method basically creates a new table to replace your existing table. I have used some of your code

CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage SELECT id, 'new' FROM target WHERE value=`old`;

Above INSERT inserts rows to be updated with 'new', no need to run an UPDATE after this.

Bring unchanged rows

INSERT INTO stage SELECT id, value FROM target WHERE value!=`old`;

After this point you have target table which is your original table intact

stage table will have both sets of rows, updated rows with 'new' value and rows you did not want to change

To replace your target with stage

DROP TABLE target;

or to keep it further verification

ALTER TABLE target RENAME TO target_old; 

ALTER TABLE stage RENAME TO target; 

Upvotes: 2

Related Questions