Passing array to WHERE IN on multiple columns in a function

Question

I have a table as following:

create table test
(
  id varchar(20),
  text varchar(20)
);
    
insert into test values(1, 'a')
insert into test values(2, 'b')
insert into test values(3, 'c')

To check multi-column in where condition, I can do this:

SELECT * FROM test 
WHERE (id, text) = ANY (VALUES('1', 'a')
                            , ('2', 'b')
                            , ('3', 'c'));

I have a function that takes 2 arrays as input:

create or replace function test_function
(
    p_ids     in varchar[],
    p_texts   in varchar[]
)
returns table (p_id   varchar,
               p_text varchar) as $$
begin
    return query 
    SELECT id, text 
    FROM test 
    WHERE (id, text) = ANY (VALUES('1', 'a')
                                , ('2', 'b')
                                , ('3', 'c'));
end;
$$ LANGUAGE plpgsql;

My question is, what is the most efficient (performance-wise) way to to replace ANY (VALUES('1', 'a'), ('2', 'b'), ('3', 'c')) with the actual array input (p_ids and p_texts)?

Zegarek · Accepted Answer

"Most efficient" in terms of syntax

Some of the choices are:

INNER JOIN with variadic unnest() instead of filtering in WHERE
Single-argument unnest() in a row() constructor, inside an array() constructor
Variadic unnest() in that or an IN subquery

The first one is not only the shortest, but also the quickest, as demonstrated lower in this post.

_{demo at db<>fiddle}

create or replace function test_function
(   p_ids     in varchar[],
    p_texts   in varchar[]
)
returns table (p_id   varchar,
               p_text varchar) as $f$
begin
    return query 
    SELECT id, text 
    FROM test 
    JOIN unnest(p_ids,p_texts)AS u(id,text) USING(id,text);
    --NATURAL JOIN unnest(p_ids,p_texts)AS u(id,text); --discouraged syntax
    --WHERE (id, text) = ANY(ARRAY(select (unnest(p_ids),unnest(p_texts))));
    --WHERE (id, text) = ANY(ARRAY(select u from unnest(p_ids,p_texts)u));
    --WHERE (id, text) IN (select*from unnest(p_ids,p_texts));
end;
$f$ LANGUAGE plpgsql;

select*from test_function('{1,2,3,8,9}','{a,b,c,x}');

p_id	p_text
1	a
2	b
3	c

Watch out for null-padding in case they are of different length

select row( unnest(array['a','b','c','d'])
           ,unnest(array[1,2])
          );

row
(a,1)
(b,2)
(c,)
(d,)

"Most efficient" in terms of performance

The join with variadic unnest() seems to perform best here (execution times in seconds, lower is better):
_{demo2 at db<>fiddle}

variant	avg	a	min	mi	max	mx	sum	sm	stddev	st	mode	md
join_variadic_unnest	0.000297	1	0.000240	2	0.000469	1	0.006236	1	0.000080	1	0.000242	2
in_variadic_unnest	0.001278	2	0.000162	1	0.022200	2	0.026829	2	0.004795	2	0.000170	1
array_row_2unnest	0.469599	3	0.459486	4	0.492036	4	0.861575	3	0.009940	3	0.459486	4
array_variadic_unnest	0.469757	4	0.442742	3	0.490769	3	0.864888	4	0.013638	4	0.442742	3

Above are timings of each method searching a few times for 11 rows that match two 20-element arrays in a 50k sample.

Here's the two winners searching 100 times for 262 rows based on two 50-element arrays in a 500k sample:

variant	avg	a	min	mi	max	mx	sum	sm	stddev	st	mode	md
join_variadic_unnest	0.000497	1	0.000314	1	0.003766	1	0.050234	1	0.000565	1	0.000316	1
in_variadic_unnest	0.000558	2	0.000343	2	0.006439	2	0.056383	2	0.000793	2	0.000345	2

Note that you can use your target table as if it was a type:
_{demo3 at db<>fiddle}

create or replace function test_function
(   tests   in test[] 
)returns setof test as 
$f$ SELECT id
         , text 
    FROM test 
    NATURAL JOIN unnest(tests);
$f$ language SQL;

Then

select*from test_function(array[ (1,'a')::test
                                ,(2,'b')::test
                                ,(3,'c')::test])

Or

select*from test_function('{"(1,a)","(2,b)","(3,c)"}'::test[])

Passing array to WHERE IN on multiple columns in a function

Answers (1)

"Most efficient" in terms of syntax

"Most efficient" in terms of performance

Related Questions