Joe
Joe

Reputation: 25

Pass multiple sets or arrays of values to a function

I'm writing a PL/pgSQL function in PostgreSQL 9.3.10 to return who has attended certain classes/sessions from the following table:

Attendance
+-------+---------+---------+
| Class | Section |  Name   |
+-------+---------+---------+
|    1  |      1  | Amy     |
|    1  |      1  | Bill    |
|    1  |      2  | Charlie |
|    1  |      2  | Dan     |
|    2  |      1  | Emily   |
|    2  |      1  | Fred    |
|    2  |      2  | George  |
+-------+---------+---------+

What I want to do is, given a array of class/section id pairs (int[][]), return all people who are in those classes/sections. For example my_func(ARRAY[[1,1],[2,2]]) returns:

+-------+---------+---------+
| Class | Section |  Name   |
+-------+---------+---------+
|    1  |      1  | Amy     |
|    1  |      1  | Bill    |
|    2  |      2  | George  |
+-------+---------+---------+

If I knew the pairs beforehand, it would be a simple:

SELECT * FROM attendance 
WHERE ((class = 1 AND section = 1) OR (class = 2 AND section = 2));

Instead, the pairs will be a parameter of the function.

Right now, the only way I can think of doing this is to have the function essentially build an SQL query string by appending a bunch of WHERE clauses to the end of the query and then calling EXECUTE. Is there a better way to get my result?

EDIT: I implemented the suggestion @Erwin's suggestion and am currently able to get the results I want. Unfortunately, it appears as though it is fairly slow. Here is the function I am running:

CREATE OR REPLACE FUNCTION public.get_attendance(int[])
  RETURNS  TABLE(
    class_c int,
    section_c int
  )
AS
$BODY$
  BEGIN
    RETURN QUERY
      SELECT class, section
      FROM generate_subscripts($1, 1) as i 
      INNER JOIN attendance ON attendance.class = $1[i][1]
                            AND  attendance.section = $1[i][2];

  END;
$BODY$
LANGUAGE plpgsql VOLATILE;

Querying it like so:

SELECT *  FROM get_attendance(ARRAY[[1,15],[2,15],[3,8]]);

And I am getting the following EXPLAIN ANALYZE output

Merge Join  (cost=60.26..50139.72 rows=30840 width=8) (actual time=44.174..142.100 rows=25290 loops=1)
  Merge Cond: ((attendance.class = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1])) AND (attendance.section = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])))
  ->  Index Only Scan using class_section_idx on attendance  (cost=0.43..43372.25 rows=1233588 width=8) (actual time=0.009..86.625 rows=1145046 loops=1)
        Heap Fetches: 0
  ->  Sort  (cost=59.83..62.33 rows=1000 width=4) (actual time=0.010..0.757 rows=10031 loops=1)
        Sort Key: (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1]), (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])
        Sort Method: quicksort  Memory: 25kB
        ->  Function Scan on generate_subscripts i  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.006..0.007 rows=3 loops=1)

The problem is that the query is scanning through all attendances in the attendance table without filtering them until the join. Is there any way to fix this?

Upvotes: 1

Views: 6245

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656706

Postgres 9.4 or later

There is a more elegant way now. Pass separate 1D-arrays for values of class and section, and unnest in parallel. Same array positions match up:

CREATE OR REPLACE FUNCTION f_attendance(_class_arr int[], _section_arr int[])
  RETURNS SETOF attendance
  LANGUAGE sql ROWS 10 STABLE AS
$func$
SELECT a.*
FROM   unnest(_class_arr, _section_arr) a(class, section)
JOIN   attendance a USING (class, section);
$func$;

Call:

SELECT * FROM f_attendance('{1,2}', '{1,2}');

See:

Postgres 9.3 (original answer)

You can achieve that with a simple SQL function. Key feature is the function generate_subscripts():

CREATE OR REPLACE FUNCTION f_attendance(_arr2d int[])
  RETURNS SETOF attendance
  LANGUAGE sql ROWS 10 STABLE AS
$func$
SELECT a.*
FROM   generate_subscripts($1, 1) i
JOIN   attendance a ON a.class   = $1[i][1]
                   AND a.section = $1[i][2];
$func$;

Call:

SELECT * FROM f_attendance(ARRAY[[1,1],[2,2]]);

Or the same with an array literal - which is more convenient in some contexts, especially with prepared statements:

SELECT * FROM f_attendance('{{1,1},{2,2}}');

The function always expects a 2D array. Even if you pass a single pair, nest it:

SELECT * FROM f_attendance('{{1,1}}');

Audit of your implementation

  1. You made the function VOLATILE, but it can be STABLE. The manual:

    Because of this snapshotting behavior, a function containing only SELECT commands can safely be marked STABLE.

    Related:

  2. You also use LANGUAGE plpgsql instead of sql, which makes sense if you execute the function multiple times in the same session. But then you must also make it STABLE or you lose that potential performance benefit. The manual once more:

    STABLE and IMMUTABLE functions use a snapshot established as of the start of the calling query, whereas VOLATILE functions obtain a fresh snapshot at the start of each query they execute.

  3. Your EXPLAIN output shows an Index Only Scan, not a sequential scan like you suspect in your comment.

  4. There is also a sort step in your EXPLAIN output that does not match the code you show. Are you sure you copied the right EXPLAIN output? How did you obtain it anyway? PL/pgSQL functions are black boxes to EXPLAIN. Did you use auto_explain? Details:

  5. The Postgres query planner has no idea how many array elements the passed parameter will have, so it is hard to plan the query and it may default to a sequential scan (depending on more factors). You can help by declaring the expected number of rows. If you typically don't have more than 10 items add ROWS 10 like I did now above. And test again.

Upvotes: 4

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125244

If you pass an array of records to the function it is simple:

with attendance (class, section, name) as(values
    (1, 1, 'Amy'),
    (1, 1, 'Bill'),
    (1, 2, 'Charlie'),
    (1, 2, 'Dan'),
    (2, 1, 'Emily'),
    (2, 1, 'Fred'),
    (2, 2, 'George')
)
select *
from attendance
where (class, section) = any(array[(1,1),(2,2)])
;
 class | section |  name  
-------+---------+--------
     1 |       1 | Amy
     1 |       1 | Bill
     2 |       2 | George

Upvotes: 1

Related Questions