xi20
xi20

Reputation: 139

Passing array to WHERE IN on multiple columns in a function

I have a table as following:

create table test
(
  id varchar(20),
  text varchar(20)
);
    
insert into test values(1, 'a')
insert into test values(2, 'b')
insert into test values(3, 'c')

To check multi-column in where condition, I can do this:

SELECT * FROM test 
WHERE (id, text) = ANY (VALUES('1', 'a')
                            , ('2', 'b')
                            , ('3', 'c'));

I have a function that takes 2 arrays as input:

create or replace function test_function
(
    p_ids     in varchar[],
    p_texts   in varchar[]
)
returns table (p_id   varchar,
               p_text varchar) as $$
begin
    return query 
    SELECT id, text 
    FROM test 
    WHERE (id, text) = ANY (VALUES('1', 'a')
                                , ('2', 'b')
                                , ('3', 'c'));
end;
$$ LANGUAGE plpgsql;

My question is, what is the most efficient (performance-wise) way to to replace ANY (VALUES('1', 'a'), ('2', 'b'), ('3', 'c')) with the actual array input (p_ids and p_texts)?

Upvotes: 1

Views: 76

Answers (1)

Zegarek
Zegarek

Reputation: 26302

"Most efficient" in terms of syntax

Some of the choices are:

  1. INNER JOIN with variadic unnest() instead of filtering in WHERE
  2. Single-argument unnest() in a row() constructor, inside an array() constructor
  3. Variadic unnest() in that or an IN subquery

The first one is not only the shortest, but also the quickest, as demonstrated lower in this post.

demo at db<>fiddle

create or replace function test_function
(   p_ids     in varchar[],
    p_texts   in varchar[]
)
returns table (p_id   varchar,
               p_text varchar) as $f$
begin
    return query 
    SELECT id, text 
    FROM test 
    JOIN unnest(p_ids,p_texts)AS u(id,text) USING(id,text);
    --NATURAL JOIN unnest(p_ids,p_texts)AS u(id,text); --discouraged syntax
    --WHERE (id, text) = ANY(ARRAY(select (unnest(p_ids),unnest(p_texts))));
    --WHERE (id, text) = ANY(ARRAY(select u from unnest(p_ids,p_texts)u));
    --WHERE (id, text) IN (select*from unnest(p_ids,p_texts));
end;
$f$ LANGUAGE plpgsql;
select*from test_function('{1,2,3,8,9}','{a,b,c,x}');
p_id p_text
1 a
2 b
3 c

Watch out for null-padding in case they are of different length

select row( unnest(array['a','b','c','d'])
           ,unnest(array[1,2])
          ); 
row
(a,1)
(b,2)
(c,)
(d,)

"Most efficient" in terms of performance

The join with variadic unnest() seems to perform best here (execution times in seconds, lower is better):
demo2 at db<>fiddle

variant avg a min mi max mx sum sm stddev st mode md
join_variadic_unnest 0.000297 1 0.000240 2 0.000469 1 0.006236 1 0.000080 1 0.000242 2
in_variadic_unnest 0.001278 2 0.000162 1 0.022200 2 0.026829 2 0.004795 2 0.000170 1
array_row_2unnest 0.469599 3 0.459486 4 0.492036 4 0.861575 3 0.009940 3 0.459486 4
array_variadic_unnest 0.469757 4 0.442742 3 0.490769 3 0.864888 4 0.013638 4 0.442742 3

Above are timings of each method searching a few times for 11 rows that match two 20-element arrays in a 50k sample.

Here's the two winners searching 100 times for 262 rows based on two 50-element arrays in a 500k sample:

variant avg a min mi max mx sum sm stddev st mode md
join_variadic_unnest 0.000497 1 0.000314 1 0.003766 1 0.050234 1 0.000565 1 0.000316 1
in_variadic_unnest 0.000558 2 0.000343 2 0.006439 2 0.056383 2 0.000793 2 0.000345 2

Note that you can use your target table as if it was a type:
demo3 at db<>fiddle

create or replace function test_function
(   tests   in test[] 
)returns setof test as 
$f$ SELECT id
         , text 
    FROM test 
    NATURAL JOIN unnest(tests);
$f$ language SQL;

Then

select*from test_function(array[ (1,'a')::test
                                ,(2,'b')::test
                                ,(3,'c')::test])

Or

select*from test_function('{"(1,a)","(2,b)","(3,c)"}'::test[])

Upvotes: 2

Related Questions