captnolimar
captnolimar

Reputation: 105

Update substrings using lookup table and replace function

Here's my setup:

Table 1 (table_with_info): Contains a list of varchars with substrings that I'd like to replace.

Table 2 (sub_info): Contains two columns: the substring in table_with_info that I'd like to replace and the string I'd like to replace it with.

What I'd like to do is replace all the substrings in table_with_info with their substitutions in sub_info.

This works to a point but the issue is that select replace(...) returns a new row for each one of the substituted words replaced and doesn't replace all of the ones in an individual row.

I'm explaining the best I can but I don't know if it's too clear. Here's the code an example of what's happening/what I'd like to happen.

Here's my code:

create table table_with_info
(
    val varchar
);

insert into table_with_info values
  ('this this is test data');

create table sub_info
(
    word_from varchar,
    word_to varchar
);

insert into sub_info values
  ('this','replace1')
, ('test', 'replace2');

update table_with_info set val = (select replace("val", "word_from", "word_to")
from "table_with_info", "sub_info"

the update() function doesn't work as select() returns two rows:

Row 1: replace1 replace1 is test data
Row 2: this this is replace2 data

so what I'd like for it for the select statement to return is:

Row 1: replace1 replace1 is test data

Any thoughts? I can't create UDFs on the system I'm running.

Upvotes: 2

Views: 1248

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657912

Your UPDATE statement is incorrect in multiple ways. Consult the manual before you try to run anything like this again. You introduce two cross joins that would make this statement extremely expensive, besides yielding nonsense.

To do this properly, you need to administer each UPDATE sequentially. In a single statement, one row version eliminates the other, while each replace would use the same original row version. You can use a DO statement for this or wrap it in a plpgsql function for instance:

DO
$do$
DECLARE
   r sub_info;
BEGIN

FOR r IN
   TABLE sub_info
   -- SELECT * FROM sub_info ORDER BY ??? -- order is relevant
LOOP
   UPDATE table_with_info
   SET    val = replace(val, r.word_from, r.word_to) 
   WHERE  val LIKE ('%' || r.word_from || '%'); -- avoid empty updates
END LOOP;

END
$do$;
  • Be aware, that the order in which updates are applied can make a difference! If the first update creates a string where the second matches (but not otherwise) ..
    So, order your columns in sub_info if that can be relevant.

  • Avoid empty updates. Without the additional WHERE clause, you would write many new row versions without changing anything. Expensive and useless.

  • double-quotes are optional for legal, lower-case names.

->SQLfiddle

Upvotes: 4

Denis de Bernardy
Denis de Bernardy

Reputation: 78523

Expanding on Erwin's answer, a do block with dynamic SQL can do the trick as well:

do $$
declare
  rec record;
  repl text;
begin
  repl := 'val'; -- quote_ident() this if needed
  for rec in select word_from, word_to from sub_info
  loop
    repl := 'replace(' || repl || ', '
                       || quote_literal(rec.word_from) || ', '
                       || quote_literal(rec.word_to) || ')';
  end loop;
  -- now do them all in a single query
  execute 'update ' || 'table_with_info'::regclass || ' set val = ' || repl;
end;
$$ language plpgsql;

Optionally, build a like parameter in a similar way to avoid updating rows needlessly.

Upvotes: 2

Related Questions