gudge
gudge

Reputation: 1083

Update multiple rows in a table with different values

I have the following schema:

DROP SCHEMA IF EXISTS s CASCADE;  
CREATE SCHEMA s;

CREATE TABLE "s"."t1"
(
    "c1" BigSerial PRIMARY KEY,
    "c2" BigInt NOT NULL,
    "c3" BigInt
)
WITH (OIDS=FALSE);

INSERT INTO s.t1 (c2, c3) VALUES (10, 100);
INSERT INTO s.t1 (c2, c3) VALUES (20, 200);
INSERT INTO s.t1 (c2, c3) VALUES (30, 300);
INSERT INTO s.t1 (c2, c3) VALUES (40, 400);

PREPARE updateplan (BigInt, BigInt) AS 
    update s.t1 
    SET c3 = $2
    WHERE c2 = $1;

EXECUTE updateplan (20, 250);

PREPARE updatearrayplan(BigInt[], BigInt[]) AS
    for i in size($1)
    DO  
        update s.t1
        SET c3 = $2[$i]
        WHERE c2 = $1[$i]
    END FOR    

EXECUTE updatearrayplan({20, 30}, {275, 375})

/* 20, 200 -> 20, 275 */
/* 30, 300 -> 30, 375 */

After execution of updatearrayplan I am expecting the rows to have these values 20 -> 275 , 30 -> 375

Is there a way to update multiple rows with different column values passed in as array. Also is there a guarantee that the order of the arrays will be maintained.

Upvotes: 0

Views: 1159

Answers (2)

krokodilko
krokodilko

Reputation: 36127

Try:

WITH arrays AS( 
    SELECT * from 
    unnest(
         ARRAY[20, 30], 
         ARRAY[275, 375]
    ) as xy(x,y)
)
UPDATE t1 
SET c3 = a.y 
FROM arrays a
WHERE c2 = a.x;

See a description of unnest function here: click


EDIT


@kordiroko Sorry. I tried out the whole day modifying your solution. Couldn't make it work.

Could be that you have an older PostgreSQL version. I tested it on version 9.5, it took me only a couple of minutes to get it work, just copy/paste and change two parameters in the query:

create table t1(
  c2 BIGINT,
  c3 bigint
);


insert into t1( c2, c3 )
select x, x * 100 
from generate_series( 1,1000000 ) x;

CREATE OR REPLACE FUNCTION updatefunc1(BigInt[], BigInt[])
RETURNS void as $$
BEGIN
    FOR i IN array_lower($1, 1) .. array_upper($1, 1)
    LOOP  
        update t1
        SET c3 = $2[i]
        WHERE c2 = $1[i];
    END LOOP;  
END;    
$$
LANGUAGE plpgsql;


CREATE OR REPLACE FUNCTION updatefunc2(BigInt[], BigInt[])
RETURNS void as $$
BEGIN
    WITH arrays AS( 
        SELECT * from 
        unnest( $1, $2  ) as xy(x,y)
    )
    UPDATE t1 
    SET c3 = a.y 
    FROM arrays a
    WHERE c2 = a.x;
END;    
$$
LANGUAGE plpgsql;


select updatefunc1(ARRAY[20], ARRAY[275]);

select updatefunc2(ARRAY[30], ARRAY[555]);

select * from t1 where c2 in (20,30);

Let me know if this is correct or there is a better solution.

It is very correct but ... it's a little slow.

I tested your function for only 100 records:

select updatefunc1(
    array( select * from generate_series(1,100)),
    array( select 22222 from generate_series(1,100))
);

it took over 12 seconds:

Result (cost=20.00..20.31 rows=1 width=0) (actual time=12259.095..12259.096 rows=1 loops=1) Output: updatefunc1(($0)::bigint[], ($1)::bigint[]) InitPlan 1 (returns $0)

Now compare it to my function, but for 100.000 records:

select updatefunc2(
    array( select * from generate_series(1,100000)),
    array( select 22222 from generate_series(1,100000))
);

a result is 1 second 150 ms:

Result (cost=20.00..20.31 rows=1 width=0) (actual time=1150.018..1150.123 rows=1 loops=1) Output: updatefunc2(($0)::bigint[], ($1)::bigint[]) InitPlan 1 (returns $0)

The above results mean, that your function is :

( 12 / 100 ) / ( 1.150 / 100000 ) = 10434,78

times slooooooooooooooooooooooooooweeeeeeeeeeeeeeeeeeeeeeer,
and in % this is only 1043400 % slower


EDIT 2


My version is 9.2.15. It throws up syntax errors

Below is a version that should work on earlier verions of PostgreSQL:

CREATE OR REPLACE FUNCTION updatefunc3(BigInt[], BigInt[])
RETURNS void as $$
BEGIN
    WITH arrays AS( 
        SELECT arr1[ rn ] as x, arr2[ rn ] as y 
        FROM (
            SELECT $1 as arr1, $2 as arr2, generate_subscripts($1, 1) As rn
        ) x
    )
    UPDATE t1 
    SET c3 = a.y 
    FROM arrays a
    WHERE c2 = a.x;
END;    
$$
LANGUAGE plpgsql;

select updatefunc3(ARRAY[40,82,77], ARRAY[333,654]);

select * from t1 where c2 in (40,82,77);

And a speed test of uptadint 100,000 rows is:

select updatefunc3(
    array( select * from generate_series(1,100000)),
    array( select 22222 from generate_series(1,100000))
);

Result (cost=20.00..20.31 rows=1 width=0) (actual time=1361.358..1361.460 rows=1 loops=1) Output: updatefunc3(($0)::bigint[], ($1)::bigint[]) InitPlan 1 (returns $0)

The time of updating 100k rows is below 1.5 seconds


EDIT 3


@kordiko : Could you please tell me why your query is so much better. My function goes through each row and updates the elements one by one. Your function also appears to do the same. Is it that all the equivalent rows are updated ed simultaneously in your query.

This is because my function runs only one update command regardless of number of elements in arrays, while your function updates elements one by one - for 100 elements it runs 100 update commands. For 1000 elements it runs 1000 update commands.
I've done my tests on a table with 1000000 rows, but without any index. In my function the update reads the table content only once (doing a full table scan), and update matching row. Your function performs 100 updates, and each one does a full table scan.
If you create and index on col2, then the speed of your function inceases dramatically, see the below test (please note that a number of elements in this test is increased from 100 to 100000:

create INDEX t1_c2_ix on t1( c2 );

select updatefunc1(
    array( select * from generate_series(1,100000)),
    array( select 22222 from generate_series(1,100000))
);

Result  (cost=20.00..20.31 rows=1 width=0) (actual time=**3430.536**..3430.636 rows=1 loops=1)
  Output: updatefunc1(($0)::bigint[], ($1)::bigint[])
  InitPlan 1 (returns $0)

Now a time is only about 3,5 second.

And a test of my function after creating the index:

select updatefunc3(
    array( select * from generate_series(1,100000)),
    array( select 22222 from generate_series(1,100000))
);

Result (cost=20.00..20.31 rows=1 width=0) (actual time=1270.619..1270.724 rows=1 loops=1) Output: updatefunc3(($0)::bigint[], ($1)::bigint[]) InitPlan 1 (returns $0)

The time remains the same, but is still 100% faster than your function.

Upvotes: 1

gudge
gudge

Reputation: 1083

My answer:

CREATE OR REPLACE FUNCTION s.updatefunc1(BigInt[], BigInt[])
RETURNS void as $$
BEGIN
    FOR i IN array_lower($1, 1) .. array_upper($1, 1)
    LOOP  
        update s.t1
        SET c3 = $2[i]
        WHERE c2 = $1[i];
    END LOOP;  
END;    
$$
LANGUAGE plpgsql;

select s.updatefunc1(ARRAY[20], ARRAY[275]);  

This does work. I get my desired answer:

   SELECT c2, c3 FROM s.t1;
   c2 | c3  
   ----+-----
   10 | 100
   30 | 300
   40 | 400
   20 | 275  --> Updated

Let me know if this is correct or there is a better solution.

Upvotes: 0

Related Questions