Reputation: 1247
This is my (extremely simplified) product table and some test data.
drop table if exists product cascade;
create table product (
product_id integer not null,
reference varchar,
price decimal(13,4),
primary key (product_id)
);
insert into product (product_id, reference, price) values
(1001, 'MX-232', 100.00),
(1011, 'AX-232', 20.00),
(1003, 'KKK 11', 11.00),
(1004, 'OXS SUPER', 0.35),
(1005, 'ROR-MOT', 200.00),
(1006, '234PPP', 30.50),
(1007, 'T555-NS', 110.25),
(1008, 'LM234-XS', 101.20),
(1009, 'MOTOR-22', 12.50),
(1010, 'MOTOR-11', 30.00),
(1002, 'XUL-XUL1', 40.00);
I real life, listing product columns is a taught task, full of joins, case-when-end clauses, etc. On the other hand, there is a large number of queries to be fulfilled, as products by brand, featured products, products by title, by tags, by range or price, etc.
I don't want to repeat and maintain the complex product column listings every time I perform a query so, my current approach is breaking query processes in two tasks:
select_products_by_xxx()
, that return product_id
arrays, properly selected and ordered.list_products()
that takes a product_id array
as a parameter.select * from list_products(select_products_by_xxx())
to obtain the desired result for every xxx
function.For example, to select product_id
in reverse order (in case this was any meaningful select for the application), a function like this would do the case.
create or replace function select_products_by_inverse ()
returns int[]
as $$
select
array_agg(product_id order by product_id desc)
from
product;
$$ language sql;
It can be tested to work as
select * from select_products_by_inverse();
select_products_by_inverse |
--------------------------------------------------------|
{1011,1010,1009,1008,1007,1006,1005,1004,1003,1002,1001}|
To encapsulate the "listing" part of the query I use this function (again, extremely simplified and without any join or case for the benefit of the example).
create or replace function list_products (
tid int[]
)
returns table (
id integer,
reference varchar,
price decimal(13,4)
)
as $$
select
product_id,
reference,
price
from
product
where
product_id = any (tid);
$$ language sql;
It works, but does not respect the order of products in the passed array.
select * from list_products(select_products_by_inverse());
id |reference|price |
----|---------|--------|
1001|MX-232 |100.0000|
1011|AX-232 | 20.0000|
1003|KKK 11 | 11.0000|
1004|OXS SUPER| 0.3500|
1005|ROR-MOT |200.0000|
1006|234PPP | 30.5000|
1007|T555-NS |110.2500|
1008|LM234-XS |101.2000|
1009|MOTOR-22 | 12.5000|
1010|MOTOR-11 | 30.0000|
1002|XUL-XUL1 | 40.0000|
So, the problem is I am passing a custom ordered array of product_id
but the list_products()
function does not respect the order inside the array.
Obviously, I could include an order by
clause in list_products()
, but remember that the ordering must be determined by the select_products_by_xxx()
functions to keep the list_products()
unique.
Any idea?
EDIT
@adamkg solution is simple and works: adding a universal order by clause like this:
order by array_position(tid, product_id);
However, this means to ordering products twice: first inside select_products_by_xxx()
and then inside list_products()
.
An explain
exploration renders the following result:
QUERY PLAN |
----------------------------------------------------------------------|
Sort (cost=290.64..290.67 rows=10 width=56) |
Sort Key: (array_position(select_products_by_inverse(), product_id))|
-> Seq Scan on product (cost=0.00..290.48 rows=10 width=56) |
Filter: (product_id = ANY (select_products_by_inverse())) |
Now I am wondering if there is any other better approach to reduce cost, keeping separability between functions.
I see two promising strategies:
explain
clause and the issue itself, it seems that an complete scan of table product
is being done inside list_products()
. As there may be thousands of products, a better approach would be to scan the passed array instead.xxx
functions can be refactored to return setof int
instead of int[]
. However, a set cannot be passed as a function parameter.Upvotes: 0
Views: 1668
Reputation: 656982
For long arrays you typically get (much!) more efficient query plans with unnesting the array and joining to the main table. In simple cases, this even preserves the original order of the array without adding ORDER BY
. Rows are processed in order. But there are no guarantees and the order may be broken with more joins or with parallel execution etc. To be sure, add WITH ORDINALITY
:
CREATE OR REPLACE FUNCTION list_products (tid int[]) -- VARIADIC?
RETURNS TABLE (
id integer,
reference varchar,
price decimal(13,4)
)
LANGUAGE sql STABLE AS
$func$
SELECT product_id, p.reference, p.price
FROM unnest(tid) WITH ORDINALITY AS t(product_id, ord)
JOIN product p USING (product_id) -- LEFT JOIN ?
ORDER BY t.ord
$func$;
Fast, simple, safe. See:
You might want to throw in the modifier VARIADIC
, so you can call the function with an array or a list of IDs (max 100 items by default). See:
I would declare STABLE
function volatility.
You might use LEFT JOIN
instead of JOIN
to make sure that all given IDs are returned - with NULL values if a row with given ID has gone missing.
db<>fiddle here
Note a subtle logic difference with duplicates in the array. While product_id
is UNIQUE
...
product_id = any (tid)
folds duplicates. (One of the reasons it typically results in more expensive query plans.)If there are no dupes in the given array, there is no difference. If there can be duplicates and you want to fold them, your task is ambiguous, as it's undefined which position to keep.
Upvotes: 2
Reputation: 14081
You're very close, all you need to add is ORDER BY array_position(tid, product_id)
.
testdb=# create or replace function list_products (
tid int[]
)
returns table (
id integer,
reference varchar,
price decimal(13,4)
)
as $$
select
product_id,
reference,
price
from
product
where
product_id = any (tid)
-- add this:
order by array_position(tid, product_id);
$$ language sql;
CREATE FUNCTION
testdb=# select * from list_products(select_products_by_inverse());
id | reference | price
------+-----------+----------
1011 | AX-232 | 20.0000
1010 | MOTOR-11 | 30.0000
1009 | MOTOR-22 | 12.5000
1008 | LM234-XS | 101.2000
1007 | T555-NS | 110.2500
1006 | 234PPP | 30.5000
1005 | ROR-MOT | 200.0000
1004 | OXS SUPER | 0.3500
1003 | KKK 11 | 11.0000
1002 | XUL-XUL1 | 40.0000
1001 | MX-232 | 100.0000
(11 rows)
Upvotes: 1