Reputation: 1949
I wonder is there a way to shorten below SELECT CASE query, by replacing the repeating arithmetic ((table1.col_a + table2.col_b) / 2) with something like a variable?
SELECT
CASE
WHEN ((table1.col_a + table2.col_b) / 2) < 100 THEN 1
WHEN ((table1.col_a + table2.col_b) / 2) < 200 THEN 2
WHEN ((table1.col_a + table2.col_b) / 2) < 250 THEN 3
WHEN ((table1.col_a + table2.col_b) / 2) < 300 THEN 4
WHEN ((table1.col_a + table2.col_b) / 2) < 800 THEN 5
<... till 20>
END bucket_range
COUNT(table1.id) as stats
FROM
table1 INNER JOIN table2
ON table1.column_x = table2.column_y
WHERE
<filter conditions>
GROUP BY 1
ORDER BY bucket_range
The solution has to be single SELECT query (on PostgreSQL 10), not stored procedure or function. It should not impact performance.
I tried the following but they are invalid:
SELECT
CASE ((table1.col_a + table2.col_b) / 2)
WHEN < 100 THEN 1
WHEN < 200 THEN 2
and
SELECT
CASE ((table1.col_a + table2.col_b) / 2) AS x
WHEN x < 100 THEN 1
WHEN x < 200 THEN 2
--- Update note
The comparison evaluation arithmetic < bound_number THEN 1 was just a simplified example. The actual bucket sizes are not consistent, I just updated the question to clarify this. The idea is that the arithmetic expression is repeating across cases.
Upvotes: 0
Views: 360
Reputation: 14861
I will present a different approach all together. Since your interval range values are not constant any simple calculation with them will fail. Rather than having a long list of WHEN conditions, you can generate a set of Integer Ranges. You then Join those ranges with your current join to determine which range your calculated value falls within. The big advantage over the case option being the query is virtually impervious to changes in the in the values - just change the array (make sure values are in order).
The following takes your list of break points as an integer array, unrolls it appends a null value to the start and end, then generates a list of ranges within a CTE. The main query joins these ranges with your existing query, and sorts on the range lower bound
with buckets(bucket_range) as
(with list (num ) as
(select (null) union all
select unnest ('{100,200,250,300,800}'::int[]) union all
select (null)
)
select int4range(num, lead(num) over(), '[)')
from list
) --select * from buckets;
select (table1.col_a + table2.col_b) / 2 as col_sum,
table1.id
from table1
join table2 on table1.column_x = table2.column_y
join buckets on ((table1.col_a + table2.col_b) / 2) <@ bucket_range
where
<filter conditions>
and not (lower(bucket_range) is null and upper(bucket_range) is null)
order by lower(bucket_range) nulls first;
I have created a demonstration. Since you did not provide input data I didn't try to produce your expected results. I just generated something and modified the selected columns to show the relationships. But it does show the query in operation.
Upvotes: 0
Reputation: 164089
You can replace the whole CASE
expression with:
SELECT CEILING((table1.col_a + table2.col_b) / 2 / 100),
..............................
assuming that any of col_a
and col_b
are real numbers so the division is not integer division.
I they are both integers then divide by 2.0
:
SELECT CEILING((table1.col_a + table2.col_b) / 2.0 / 100),
..............................
Upvotes: 0
Reputation: 521053
If I understand the math correctly, you should be able to just divide by 100 and take the floor:
SELECT
1 + (((table1.col_a + table2.col_b) / 2) / 100) AS bucket_range,
COUNT(table1.id) AS stats
FROM table1
INNER JOIN table2
ON table1.column_x = table2.column_y
WHERE
<filter conditions>
GROUP BY 1
ORDER BY
bucket_range;
Upvotes: 2
Reputation:
You can use a derived table and then refer to the column alias in the outer query:
SELECT CASE
WHEN col_sum < 100 THEN 1
WHEN col_sum < 200 THEN 2
WHEN col_sum < 300 THEN 3
WHEN col_sum < 400 THEN 4
WHEN col_sum < 500 THEN 5
<... till 20>
END bucket_range,
COUNT(id) as stats
FROM (
SELECT (table1.col_a + table2.col_b) / 2 as col_sum,
table1.id
FROM table1
JOIN table2 ON table1.column_x = table2.column_y
WHERE
<filter conditions>
) d
GROUP BY 1
ORDER BY bucket_range
Unrelated, but: if table1.id
is defined as NOT NULL then you can also use count(*)
which will be slightly faster than count(id)
(if table1.id
can contain null values, the two expressions wouldn't be equivalent);
If the buckets are all of the same size, you might want to have a look at the width_bucket()
function which achieves a similar thing but without writing a lengthy CASE expression.
Something like:
width_bucket((table1.col_a + table2.col_b) / 2, 0, 10000, 20)
Upvotes: 0