coterobarros
coterobarros

Reputation: 1247

How to respect the order of an array in a PostgreSQL select sentence

This is my (extremely simplified) product table and some test data.

drop table if exists product cascade;

create table product (
  product_id  integer not null,
  reference   varchar,
  price       decimal(13,4),
  
  primary key (product_id)
);

insert into product (product_id, reference, price) values 
(1001, 'MX-232',    100.00),
(1011, 'AX-232',     20.00),
(1003, 'KKK 11',     11.00),
(1004, 'OXS SUPER',   0.35),
(1005, 'ROR-MOT',   200.00),
(1006, '234PPP',     30.50),
(1007, 'T555-NS',   110.25),
(1008, 'LM234-XS',  101.20),
(1009, 'MOTOR-22',   12.50),
(1010, 'MOTOR-11',   30.00),
(1002, 'XUL-XUL1',   40.00);

I real life, listing product columns is a taught task, full of joins, case-when-end clauses, etc. On the other hand, there is a large number of queries to be fulfilled, as products by brand, featured products, products by title, by tags, by range or price, etc.

I don't want to repeat and maintain the complex product column listings every time I perform a query so, my current approach is breaking query processes in two tasks:

For example, to select product_id in reverse order (in case this was any meaningful select for the application), a function like this would do the case.

create or replace function select_products_by_inverse () 
returns int[]
as $$
  select 
    array_agg(product_id order by product_id desc)  
  from 
    product;
$$ language sql;

It can be tested to work as

select * from select_products_by_inverse();

select_products_by_inverse                              |
--------------------------------------------------------|
{1011,1010,1009,1008,1007,1006,1005,1004,1003,1002,1001}|

To encapsulate the "listing" part of the query I use this function (again, extremely simplified and without any join or case for the benefit of the example).

create or replace function list_products (
    tid int[]
) 
returns table (
  id        integer,
  reference varchar,
  price     decimal(13,4)
)
as $$
  select
    product_id,
    reference,
    price
  from
    product
  where
    product_id = any (tid);
$$ language sql;

It works, but does not respect the order of products in the passed array.

select * from list_products(select_products_by_inverse());

id  |reference|price   |
----|---------|--------|
1001|MX-232   |100.0000|
1011|AX-232   | 20.0000|
1003|KKK 11   | 11.0000|
1004|OXS SUPER|  0.3500|
1005|ROR-MOT  |200.0000|
1006|234PPP   | 30.5000|
1007|T555-NS  |110.2500|
1008|LM234-XS |101.2000|
1009|MOTOR-22 | 12.5000|
1010|MOTOR-11 | 30.0000|
1002|XUL-XUL1 | 40.0000|

So, the problem is I am passing a custom ordered array of product_id but the list_products() function does not respect the order inside the array.

Obviously, I could include an order by clause in list_products(), but remember that the ordering must be determined by the select_products_by_xxx() functions to keep the list_products() unique.

Any idea?


EDIT

@adamkg solution is simple and works: adding a universal order by clause like this:

order by array_position(tid, product_id);

However, this means to ordering products twice: first inside select_products_by_xxx() and then inside list_products().

An explain exploration renders the following result:

QUERY PLAN                                                            |
----------------------------------------------------------------------|
Sort  (cost=290.64..290.67 rows=10 width=56)                          |
  Sort Key: (array_position(select_products_by_inverse(), product_id))|
  ->  Seq Scan on product  (cost=0.00..290.48 rows=10 width=56)       |
        Filter: (product_id = ANY (select_products_by_inverse()))     |

Now I am wondering if there is any other better approach to reduce cost, keeping separability between functions.

I see two promising strategies:

Upvotes: 0

Views: 1668

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656982

For long arrays you typically get (much!) more efficient query plans with unnesting the array and joining to the main table. In simple cases, this even preserves the original order of the array without adding ORDER BY. Rows are processed in order. But there are no guarantees and the order may be broken with more joins or with parallel execution etc. To be sure, add WITH ORDINALITY:

CREATE OR REPLACE FUNCTION list_products (tid int[])  -- VARIADIC?
  RETURNS TABLE (
   id        integer,
   reference varchar,
   price     decimal(13,4)
   )
  LANGUAGE sql STABLE AS
$func$
  SELECT product_id, p.reference, p.price
  FROM   unnest(tid) WITH ORDINALITY AS t(product_id, ord)
  JOIN   product p USING (product_id)  -- LEFT JOIN ?
  ORDER  BY t.ord
$func$;

Fast, simple, safe. See:

You might want to throw in the modifier VARIADIC, so you can call the function with an array or a list of IDs (max 100 items by default). See:

I would declare STABLE function volatility.

You might use LEFT JOIN instead of JOIN to make sure that all given IDs are returned - with NULL values if a row with given ID has gone missing.

db<>fiddle here

Note a subtle logic difference with duplicates in the array. While product_id is UNIQUE ...

  • unnest + left join returns exactly one row for every given ID - preserving duplicates in the given IDs if any.
  • product_id = any (tid) folds duplicates. (One of the reasons it typically results in more expensive query plans.)

If there are no dupes in the given array, there is no difference. If there can be duplicates and you want to fold them, your task is ambiguous, as it's undefined which position to keep.

Upvotes: 2

AdamKG
AdamKG

Reputation: 14081

You're very close, all you need to add is ORDER BY array_position(tid, product_id).

testdb=# create or replace function list_products (
    tid int[]
) 
returns table (
  id        integer,
  reference varchar,
  price     decimal(13,4)
)
as $$
  select
    product_id,
    reference,
    price
  from
    product
  where
    product_id = any (tid)
-- add this:
order by array_position(tid, product_id);
$$ language sql;
CREATE FUNCTION
testdb=# select * from list_products(select_products_by_inverse());
  id  | reference |  price   
------+-----------+----------
 1011 | AX-232    |  20.0000
 1010 | MOTOR-11  |  30.0000
 1009 | MOTOR-22  |  12.5000
 1008 | LM234-XS  | 101.2000
 1007 | T555-NS   | 110.2500
 1006 | 234PPP    |  30.5000
 1005 | ROR-MOT   | 200.0000
 1004 | OXS SUPER |   0.3500
 1003 | KKK 11    |  11.0000
 1002 | XUL-XUL1  |  40.0000
 1001 | MX-232    | 100.0000
(11 rows)



Upvotes: 1

Related Questions