Reputation: 8525
I have a Postgresql database where I have these two tables.
shipping_method
id | name | abbrev
---+----------+-------
1 | Standard | ST
2 | Express | EX
shipping_details
:
id | shipping_method_id | estimated_time_min | estimated_time_max | price |
---|---|---|---|---|
2 | 1 | 02:00:00 | 04:00:00 | 230 |
3 | 2 | 00:03:00 | 01:00:00 | 500 |
4 | 1 | 02:00:00 | 04:00:00 | 1230 |
5 | 1 | 02:00:00 | 04:00:00 | 850 |
6 | 2 | 01:00:00 | 02:00:00 | 1785 |
My goal is to fetch the most expensive shipping details per shipping method (for a specific product [not in OP]).
So far, I wrote this query:
SELECT
sm.id, sm.name, MAX(sd.price) AS max_price
FROM
shipping_details AS sd
LEFT JOIN
shipping_method AS sm ON sm.id = sd.shipping_method_id
GROUP BY
sm.id
which returns:
id | name | max_price
---+----------+---------
2 | Express | 1785
1 | Standard | 1230
Through that query, I am not able to get the shipping_details
columns without putting them in GROUP BY
clause. I mainly need shipping details for each specific shipping method that has the higher price.
How can I achieve that?
Upvotes: 0
Views: 626
Reputation: 656882
To get additional columns from each row with the highest price in shipping_details
, use DISTINCT ON
:
SELECT sm.id, sm.name, sd.*
FROM shipping_method sm
LEFT JOIN (
SELECT DISTINCT ON (shipping_method_id)
shipping_method_id AS id, price AS max_price
-- add more columns as you like
FROM shipping_details sd
ORDER BY sd.shipping_method_id DESC, sd.price DESC, sd.id -- ①
) sd USING (id);
As long as all rows from shipping_details
are involved, it's typically fastest to aggregate first, and then join. (Not when the table holds many additional rows that are eliminated by the join.)
If price
can be NULL
make that ORDER BY ... price DESC NULLS LAST
- else NULL
sorts on top in descending order. Be sure to match an existing index.
① An index on shipping_details (shipping_method_id, prize)
makes it fast if the table shipping_details
is big. Or an index on (shipping_method_id DESC, prize DESC)
. It's important that the sort order of both columns are in sync with the query. Postgres can scan the index forwards or backwards, but for multicolumn indices the sort order of all columns needs to be in sync with the query. See:
One more issue to note: If there can be multiple shipping details with the highest price, you get an arbitrary pick which can change with every execution, typically after writes to involved rows. To get a stable, deterministic result, add more ORDER BY
expressions as tiebreaker. Like sd.id
I appended above. Then the smallest id
is the winner, consistently.
If there are many ties like that, it even pays to append id
to the index. Like (shipping_method_id, prize, id DESC)
- note the opposite sort order for id
!
Related:
There may be (much) faster techniques, depending on undisclosed details. Assuming many rows per shipping method, and an applicable index as discussed, this should be much faster:
SELECT sm.id, sm.name, sd.*
FROM shipping_method sm
LEFT JOIN LATERAL (
SELECT sd.price AS max_price, id AS shipping_details_id
FROM shipping_details sd
WHERE sd.shipping_method_id = sm.id
ORDER BY sd.price DESC NULLS LAST
LIMIT 1
) sd ON true;
Upvotes: 1
Reputation: 24568
here is one way using window function:
select *
from shipping_method sm
join (
select *, row_number() over (partition by shipping_method_id order by price desc) rn
from shipping_details sd) t
on sd.shipping_method_id = t.id
and rn = 1 ;
Upvotes: 0
Reputation: 521409
Use DISTINCT ON
:
SELECT DISTINCT ON (sm.id) sm.id, sm.name, sd.price AS max_price
FROM shipping_details AS sd
LEFT JOIN shipping_method AS sm
ON sm.id = sd.shipping_method_id
ORDER BY sm.id, sd.price DESC;
The above logic will return the shipping method having the max price.
Upvotes: 2