Reputation: 25
I'm writing a PL/pgSQL function in PostgreSQL 9.3.10 to return who has attended certain classes/sessions from the following table:
Attendance
+-------+---------+---------+
| Class | Section | Name |
+-------+---------+---------+
| 1 | 1 | Amy |
| 1 | 1 | Bill |
| 1 | 2 | Charlie |
| 1 | 2 | Dan |
| 2 | 1 | Emily |
| 2 | 1 | Fred |
| 2 | 2 | George |
+-------+---------+---------+
What I want to do is, given a array of class/section id pairs (int[][]
), return all people who are in those classes/sections. For example my_func(ARRAY[[1,1],[2,2]])
returns:
+-------+---------+---------+
| Class | Section | Name |
+-------+---------+---------+
| 1 | 1 | Amy |
| 1 | 1 | Bill |
| 2 | 2 | George |
+-------+---------+---------+
If I knew the pairs beforehand, it would be a simple:
SELECT * FROM attendance
WHERE ((class = 1 AND section = 1) OR (class = 2 AND section = 2));
Instead, the pairs will be a parameter of the function.
Right now, the only way I can think of doing this is to have the function essentially build an SQL query string by appending a bunch of WHERE
clauses to the end of the query and then calling EXECUTE
. Is there a better way to get my result?
EDIT: I implemented the suggestion @Erwin's suggestion and am currently able to get the results I want. Unfortunately, it appears as though it is fairly slow. Here is the function I am running:
CREATE OR REPLACE FUNCTION public.get_attendance(int[])
RETURNS TABLE(
class_c int,
section_c int
)
AS
$BODY$
BEGIN
RETURN QUERY
SELECT class, section
FROM generate_subscripts($1, 1) as i
INNER JOIN attendance ON attendance.class = $1[i][1]
AND attendance.section = $1[i][2];
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
Querying it like so:
SELECT * FROM get_attendance(ARRAY[[1,15],[2,15],[3,8]]);
And I am getting the following EXPLAIN ANALYZE output
Merge Join (cost=60.26..50139.72 rows=30840 width=8) (actual time=44.174..142.100 rows=25290 loops=1)
Merge Cond: ((attendance.class = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1])) AND (attendance.section = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])))
-> Index Only Scan using class_section_idx on attendance (cost=0.43..43372.25 rows=1233588 width=8) (actual time=0.009..86.625 rows=1145046 loops=1)
Heap Fetches: 0
-> Sort (cost=59.83..62.33 rows=1000 width=4) (actual time=0.010..0.757 rows=10031 loops=1)
Sort Key: (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1]), (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])
Sort Method: quicksort Memory: 25kB
-> Function Scan on generate_subscripts i (cost=0.00..10.00 rows=1000 width=4) (actual time=0.006..0.007 rows=3 loops=1)
The problem is that the query is scanning through all attendances in the attendance table without filtering them until the join. Is there any way to fix this?
Upvotes: 1
Views: 6245
Reputation: 656706
There is a more elegant way now. Pass separate 1D-arrays for values of class
and section
, and unnest in parallel. Same array positions match up:
CREATE OR REPLACE FUNCTION f_attendance(_class_arr int[], _section_arr int[])
RETURNS SETOF attendance
LANGUAGE sql ROWS 10 STABLE AS
$func$
SELECT a.*
FROM unnest(_class_arr, _section_arr) a(class, section)
JOIN attendance a USING (class, section);
$func$;
Call:
SELECT * FROM f_attendance('{1,2}', '{1,2}');
See:
You can achieve that with a simple SQL function. Key feature is the function generate_subscripts()
:
CREATE OR REPLACE FUNCTION f_attendance(_arr2d int[])
RETURNS SETOF attendance
LANGUAGE sql ROWS 10 STABLE AS
$func$
SELECT a.*
FROM generate_subscripts($1, 1) i
JOIN attendance a ON a.class = $1[i][1]
AND a.section = $1[i][2];
$func$;
Call:
SELECT * FROM f_attendance(ARRAY[[1,1],[2,2]]);
Or the same with an array literal - which is more convenient in some contexts, especially with prepared statements:
SELECT * FROM f_attendance('{{1,1},{2,2}}');
The function always expects a 2D array. Even if you pass a single pair, nest it:
SELECT * FROM f_attendance('{{1,1}}');
You made the function VOLATILE
, but it can be STABLE
. The manual:
Because of this snapshotting behavior, a function containing only
SELECT
commands can safely be markedSTABLE
.
Related:
You also use LANGUAGE plpgsql
instead of sql
, which makes sense if you execute the function multiple times in the same session. But then you must also make it STABLE
or you lose that potential performance benefit. The manual once more:
STABLE
andIMMUTABLE
functions use a snapshot established as of the start of the calling query, whereas VOLATILE functions obtain a fresh snapshot at the start of each query they execute.
Your EXPLAIN
output shows an Index Only Scan, not a sequential scan like you suspect in your comment.
There is also a sort step in your EXPLAIN
output that does not match the code you show. Are you sure you copied the right EXPLAIN
output? How did you obtain it anyway? PL/pgSQL functions are black boxes to EXPLAIN
. Did you use auto_explain
? Details:
The Postgres query planner has no idea how many array elements the passed parameter will have, so it is hard to plan the query and it may default to a sequential scan (depending on more factors). You can help by declaring the expected number of rows. If you typically don't have more than 10 items add ROWS 10
like I did now above. And test again.
Upvotes: 4
Reputation: 125244
If you pass an array of records to the function it is simple:
with attendance (class, section, name) as(values
(1, 1, 'Amy'),
(1, 1, 'Bill'),
(1, 2, 'Charlie'),
(1, 2, 'Dan'),
(2, 1, 'Emily'),
(2, 1, 'Fred'),
(2, 2, 'George')
)
select *
from attendance
where (class, section) = any(array[(1,1),(2,2)])
;
class | section | name
-------+---------+--------
1 | 1 | Amy
1 | 1 | Bill
2 | 2 | George
Upvotes: 1