Reputation: 2699
Here is the query that I am trying,
SELECT s.id, s.name AS name,
CASE WHEN (ARRAY_AGG(tgs.status) @> '{Hard} ') THEN 'Hard'
WHEN (ARRAY_AGG(tgs.status) @> '{Soft} ') THEN 'Soft'
WHEN (ARRAY_AGG(tgs.status) @> '{Fluid} ') THEN 'Fluid'
WHEN (ARRAY_AGG(tgs.status) @> '{Gummy} ') THEN 'Gummy'
WHEN (ARRAY_AGG(tgs.status) @> '{Expired} ') THEN 'Expired'
END AS status,
COUNT(*) OVER()
FROM sweets AS s
INNER JOIN tasty_goofy_sweets AS tgs on tgs.sweet_id = s.id
GROUP BY s.id;
While implementing this my friend suggested that, instead of computing array_agg every time in the switch case, we could use LEFT JOIN LATERAL and compute it just once. i.e) to implement like below
SELECT s.id, s.name AS name,
CASE WHEN (tgs.status @> '{Hard} ') THEN 'Hard'
WHEN (tgs.arr_status @> '{Soft} ') THEN 'Soft'
WHEN (tgs.arr_status @> '{Fluid} ') THEN 'Fluid'
WHEN (tgs.arr_status @> '{Gummy} ') THEN 'Gummy'
WHEN (tgs.arr_status @> '{Expired} ') THEN 'Expired'
END AS status,
COUNT(*) OVER()
FROM sweets AS s
LEFT JOIN LATERAL ( SELECT ARRAY_AGG(tgs.status) AS arr_status FROM tasty_goofy_sweets tgs WHERE tgs.sweet_id = s.id
) AS tgs ON TRUE
GROUP BY s.id;
But I am not sure if Postgres computes the ARRAY_AGG
value every time, how can we decide which approach is better? I tried looking at explain analyse
for both the queries, the number of rows involved in the latter query is more than the former. But I don't understand why this is so?
I intuitively feel the former approach is better, but can someone please reason out, which is better and why or am I asking too much ?
Upvotes: 2
Views: 807
Reputation: 1269743
I am fairly certain that in a case
expression, the when
clause is going to be evaluated separately for each condition. That means that your colleague is correct . . . probably.
The operative part of the documentation is:
Each condition is an expression that returns a boolean result. If the condition's result is true, the value of the CASE expression is the result that follows the condition, and the remainder of the CASE expression is not processed.
It is possible that Postgres does do some sort of optimization of subexpressions by evaluating them once. However, the statement: "the remainder of the CASE expression is not processed" is pretty strong and suggests that each clause will only be processed after the previous ones have evaluated to false (or NULL
).
Regardless of whether the optimizer picks figures out that a function can be called only once, the lateral join guarantees that it will be evaluated once, so that seems like the safer solution for an expensive operation.
Upvotes: 1
Reputation: 222462
Most likely, Postgres will optimize away the multiple array_agg()
s, compute it only once and reuse the results in each comparison. That's quite simple query optimization, that the database should easily spot.
Let me suggest, however, to simplify the query by using conditional aggreagation. Yo don't neeed to aggregate into an array just to check if a given value is there:
select
s.id,
s.name
case
when count(*) filter(where status = 'Hard') > 0 then 'Hard',
when count(*) filter(where status = 'Soft') > 0 then 'Soft',
when count(*) filter(where status = 'Fluid') > 0 then 'Fluid'
when count(*) filter(where status = 'Gummy') > 0 then 'Gummy',
when count(*) filter(where status = 'Expired') > 0 then 'Expired'
end status,
count(*) over() cnt
from sweets s
inner join tasty_goofy_sweets AS tgs on tgs.sweet_id = s.id
group by s.id;
You could also express this without aggregation, using a lateral join and a conditional sort:
select
s.id,
s.name,
tgs.status,
count(*) over() cnt
from sweets s
cross join lateral (
select status
from tasty_goofy_sweets as tgs
where tgs.sweet_id = s.id
order by case status
when 'Hard' then 1
when 'Soft' then 2
when 'Fluid' then 3
when 'Gummy' then 4
when 'Expired' then 5
end
limit 1
) tgs
Upvotes: 2