Reputation: 8525

How to get columns which are not in GROUP BY?

I have a Postgresql database where I have these two tables.

shipping_method

id | name     | abbrev
---+----------+-------
1  | Standard | ST
2  | Express  | EX

shipping_details:

id	shipping_method_id	estimated_time_min	estimated_time_max	price
2	1	02:00:00	04:00:00	230
3	2	00:03:00	01:00:00	500
4	1	02:00:00	04:00:00	1230
5	1	02:00:00	04:00:00	850
6	2	01:00:00	02:00:00	1785

My goal is to fetch the most expensive shipping details per shipping method (for a specific product [not in OP]).

So far, I wrote this query:

SELECT 
    sm.id, sm.name, MAX(sd.price) AS max_price
FROM 
    shipping_details AS sd 
LEFT JOIN 
    shipping_method AS sm ON sm.id = sd.shipping_method_id
GROUP BY 
    sm.id

which returns:

id | name     | max_price
---+----------+---------
2  | Express  | 1785
1  | Standard | 1230

Through that query, I am not able to get the shipping_details columns without putting them in GROUP BY clause. I mainly need shipping details for each specific shipping method that has the higher price.

How can I achieve that?

Upvotes: 0

Answers (3)

Erwin Brandstetter

Reputation: 656882

To get additional columns from each row with the highest price in shipping_details, use DISTINCT ON:

SELECT sm.id, sm.name, sd.*
FROM   shipping_method sm
LEFT   JOIN (
   SELECT DISTINCT ON (shipping_method_id)
          shipping_method_id AS id, price AS max_price
      --  add more columns as you like
   FROM   shipping_details sd
   ORDER  BY sd.shipping_method_id DESC, sd.price DESC, sd.id  -- ①
   ) sd USING (id);

As long as all rows from shipping_details are involved, it's typically fastest to aggregate first, and then join. (Not when the table holds many additional rows that are eliminated by the join.)

If price can be NULL make that ORDER BY ... price DESC NULLS LAST - else NULL sorts on top in descending order. Be sure to match an existing index.

① An index on shipping_details (shipping_method_id, prize) makes it fast if the table shipping_details is big. Or an index on (shipping_method_id DESC, prize DESC). It's important that the sort order of both columns are in sync with the query. Postgres can scan the index forwards or backwards, but for multicolumn indices the sort order of all columns needs to be in sync with the query. See:

Optimizing queries on a range of timestamps (two columns)

One more issue to note: If there can be multiple shipping details with the highest price, you get an arbitrary pick which can change with every execution, typically after writes to involved rows. To get a stable, deterministic result, add more ORDER BY expressions as tiebreaker. Like sd.id I appended above. Then the smallest id is the winner, consistently.
If there are many ties like that, it even pays to append id to the index. Like (shipping_method_id, prize, id DESC) - note the opposite sort order for id!

There may be (much) faster techniques, depending on undisclosed details. Assuming many rows per shipping method, and an applicable index as discussed, this should be much faster:

SELECT sm.id, sm.name, sd.*
FROM   shipping_method sm
LEFT   JOIN LATERAL (
   SELECT sd.price AS max_price, id AS shipping_details_id
   FROM   shipping_details sd
   WHERE  sd.shipping_method_id = sm.id
   ORDER  BY sd.price DESC NULLS LAST
   LIMIT  1
   ) sd ON true;

What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Optimize GROUP BY query to retrieve latest row per user
Select first row in each GROUP BY group? (note performance of query nr. 5!)

Upvotes: 1

eshirvana

Reputation: 24568

here is one way using window function:

select *
from shipping_method sm 
join (
      select *, row_number() over (partition by shipping_method_id order by price desc) rn  
      from shipping_details sd) t 
on sd.shipping_method_id = t.id
and rn = 1 ;

Upvotes: 0

Tim Biegeleisen

Reputation: 521409

Use DISTINCT ON:

SELECT DISTINCT ON (sm.id) sm.id, sm.name, sd.price AS max_price
FROM shipping_details AS sd 
LEFT JOIN shipping_method AS sm
    ON sm.id = sd.shipping_method_id
ORDER BY sm.id, sd.price DESC;

The above logic will return the shipping method having the max price.

Upvotes: 2

How to get columns which are not in GROUP BY?

Answers (3)

Related Questions