How to include column which is not a part of group by

Question

How to get max of w_cost by v_id and also the final result set should include av_id.

s_id sg_id  r_cost  w_cost  av_id v_id
123  100    0.50    1.00    1     333
123  105    0.75    0.50    2     333
123  330    2.00    Null    3     888

If w_cost is NULL, r_cost should be taken. The final result should be:

s_id v_id   w_cost  av_id
123  333     1.00   1
123  888     2.00   3

Basic query is

SELECT
t.s_id,
sv.v_id,
sv.w_cost,
CASE
  WHEN sv.w_cost IS NULL THEN
    sv.r_cost::numeric
  ELSE sv.w_cost::numeric
  END AS cost
FROM test t
INNER JOIN stra_ven sv tmad ON
t.s_id = sv.s_id 
GROUP BY t.s_id,sv.v_id,sv.w_cost;

S-Man · Accepted Answer

Window Functions:

This is what window functions are made for https://www.postgresql.org/docs/current/static/tutorial-window.html

See the db<>fiddle

SELECT 
    s_id, v_id, w_cost, av_id
FROM
    (SELECT 
        s_id,
        v_id,
        av_id,
        COALESCE(w_cost, r_cost) as w_cost,                                    -- A
        MAX(COALESCE(w_cost, r_cost)) OVER (PARTITION BY v_id) as max_w_cost   -- B
     FROM testdata) s
WHERE 
    max_w_cost = w_cost                                                        -- C

A: COALESCE gives the first not NULL value in the list. So if w_cost is NULL, r_cost will be taken.

B: The window function MAX() gives the max value in the partition of v_id. The max function ueses the same COALESCE clause as in (A)

C: The WHERE clause filters the row where max equals the current value of w_cost.

If there are more rows with the same MAX value in my example you get all of them. If you just want one of them then you can add a column to the partition to make the window more precise. Or you can order by something and just take the first one or you take a more or less random one by DISTINCT ON.

DISTINCT ON:

With DISTINCT ON you can filter the distinct row for special columns (whereas the normal DISTINCT looks at all columns). Because a result set without any ORDER BY clause can be very random, it should be sorted by v_id and the final cost (greatest first (DESC); calculated with the COALESCE function as stated above). Then the DISTINCT takes the first row.

db<>fiddle

SELECT DISTINCT ON (v_id)                  -- C
    s_id, v_id, cost as w_cost, av_id
FROM
    (SELECT 
        s_id,
        v_id,
        av_id,
        COALESCE(w_cost, r_cost) as cost   -- A
     FROM testdata
     ORDER BY v_id, cost DESC) s           -- B

A: COALESCE as mentioned in the window function section.

B: Ordering to get the wanted row first.

C: DISTINCT ON filters for every distinct v_id the first row.

How to include column which is not a part of group by

Answers (2)

Related Questions