Reputation: 9883
I need to create a Postgres 9.1 PL/pgSQL stored procedure that, among other parameters, takes a sequence of values that directly reference values in one of my database columns. As far as I can tell, the canonical way to do this in Postgres is an array.
This is a rather basic task, of course. My problem is scalability: My code basically works, but performs badly once the sequences passed in get large (as in a few hundreds or thousands of values):
Even rather simple SELECT statements within my stored procedure using the array in the form
SELECT <some columns>
FROM <some tables>
WHERE <some other select criteria>
AND <column with values selected by array parameter>
IN (SELECT * FROM unnest(<array parameter>))
take several seconds to execute even though the database is not very large yet and there are only tens of values in the array.
My first suspicion was that unnest(...)
is the problem, but selecting only from the table with the column referenced in the array parameter is really fast:
SELECT <some columns>
FROM <table with column ref'd in array parameter>
WHERE <column with values selected by array parameter>
IN (SELECT * FROM unnest(<array parameter>))
only takes a few milliseconds.
My questions:
Upvotes: 1
Views: 3590
Reputation: 656706
How can I make my queries perform better ?
I would expect faster performance if you rewrite your query
SELECT <some columns>
FROM <some tables>
WHERE <some other select criteria>
AND <column with values selected by array parameter>
IN (SELECT * FROM unnest(<array parameter>));
to:
SELECT <some columns>
FROM (SELECT unnest(<array parameter>) AS param) x
JOIN <filtered table> ON <filter column> = x.param
JOIN <other table> ON <join criteria>
WHERE <some other select criteria>;
It sounds like the query planner chooses a suboptimal plan, misjudging the cost of your other WHERE
criteria in comparison to the IN clause. By transforming it to an explicit JOIN
clause you should get a better query plan.
Generally, JOIN
s tend to be faster than large IN
clauses in PostgreSQL.
Is there an alternative to using an array as parameter ?
Yes.
You could create temporary table, fill it and run a query joining against it.
CREATE TEMP TABLE x(id int);
INSERT INTO x VALUES
(1), (2), (17), (18);
SELECT <some columns>
FROM x
JOIN <filtered table> ON <filter column> = x.id
JOIN <other table> ON <join criteria>
WHERE <some other select criteria>;
Or, faster yet, use a CTE for the same purpose:
WITH x(id) AS (
VALUES (1::int), (2), (17), (18) -- type-cast on first element is enough
)
SELECT <some columns>
FROM x
JOIN <filtered table> ON <filter column> = x.id
JOIN <other table> ON <join criteria>
WHERE <some other select criteria>;
As long as you want to use a function, an array parameter, unnested inside would be my choice, too. You could also use the CTE in my last example inside a function, just with unnest(arr) instead of a VALUES clause.
Upvotes: 2