Reputation: 21
I am writing a function in PostgreSQL. It does basically 3 steps:
Instead of doing this looping, if I write single query for insert/update, will it be faster than above mentioned approach? How can I achieve same result by writing single query instead looping through every records and doing updation/insertion.
My current approach is as below
CREATE OR REPLACE FUNCTION fun1()
RETURNS void AS
$BODY$DECLARE
source_tab_row RECORD;
v_col1 TEXT;
v_col2 TEXT;
v_col3 TEXT;
v_col4 double precision ;
cnt integer;
BEGIN
FOR source_tab_row IN (SELECT * FROM source_tab where col5='abc')
LOOP
v_col1=source_tab_row.col1;
v_col2=source_tab_row.col2;
v_col3=source_tab_row.col3;
v_col4=source_tab_row.col4;
select count(*) INTO cnt from dest_tab where col1=v_col1;
if (cnt =0) then
-- If records is not found
INSERT INTO dest_tab(col1, col2, col3,col4)
VALUES( v_col1, v_col2, v_col3,v_col4) ;
else
--if records found then update it
update dest_tab set col1=v_col1, col2=v_col2, col3=v_col3,col4=v_col4
where col1=v_col1;
end if;
END LOOP;
END;
$BODY$ LANGUAGE plpgsql;
Upvotes: 2
Views: 549
Reputation: 656804
If you have PostgreSQL 9.1 or later, you should definitely use a data-modifying CTE for this:
WITH x AS (
UPDATE dest_tab d
SET col2 = s.col2
, col3 = s.col3
-- , ...
FROM source_tab s
WHERE s.col5 = 'abc'
AND s.col1 = d.col1
RETURNING col1
)
INSERT INTO dest_tab(col1, col2, col3, col4)
SELECT s.col1, s.col2, s.col3, s.col4
FROM source_tab s
WHERE s.col5 = 'abc'
LEFT JOIN x USING (col1)
WHERE x.col1 IS NULL;
As @Craig already posted, such operations are regularly much faster as set-based SQL than by iterating through individual rows.
However, this form is faster and simpler. It also avoids the inherent (tiny!) race condition to a large extent. To begin with, as this is a single SQL command, the time slot is even shorter. Also, if a concurrent transaction should enter competing rows between the UPDATE
and the INSERT
, you get a duplicate key violation (provided you have a pk / unique constraint as you should). Because you don't query dest_tab
a second time and reuse the original set for the INSERT
. Faster, better.
If you ever get to see a duplicate key violation: nothing bad happened, just retry the query.
It does not cover the opposite case where a concurrent transaction would DELETE
a row in the meantime. This is really the less important / frequent case, IMO.
If you use plpgsql for this, simplify:
CREATE OR REPLACE FUNCTION fun1()
RETURNS void AS
$BODY$
DECLARE
_source source_tab; -- name of table = type
BEGIN
FOR _source IN
SELECT * FROM source_tab where col5 = 'abc'
LOOP
UPDATE dest_tab
SET col2 = _source.col2 -- don't update col1, it doesn't change
,col3 = _source.col3
,col4 = _source.col4
WHERE col1 = _source.col1;
IF NOT FOUND THEN -- no row found
INSERT INTO dest_tab(col1, col2, col3,col4)
VALUES (_source.col1, _source.col2, _source.col3, _source.col4);
END IF;
END LOOP;
END
$BODY$ LANGUAGE plpgsql;
Upvotes: 2