shiouming
shiouming

Reputation: 1949

How to simplify repeating arithmetic in SQL SELECT CASE query

I wonder is there a way to shorten below SELECT CASE query, by replacing the repeating arithmetic ((table1.col_a + table2.col_b) / 2) with something like a variable?

SELECT
  CASE
    WHEN ((table1.col_a + table2.col_b) / 2) < 100 THEN 1
    WHEN ((table1.col_a + table2.col_b) / 2) < 200 THEN 2
    WHEN ((table1.col_a + table2.col_b) / 2) < 250 THEN 3
    WHEN ((table1.col_a + table2.col_b) / 2) < 300 THEN 4
    WHEN ((table1.col_a + table2.col_b) / 2) < 800 THEN 5
    <... till 20>
  END bucket_range
  COUNT(table1.id) as stats
FROM
  table1 INNER JOIN table2
    ON table1.column_x = table2.column_y
WHERE
  <filter conditions>
GROUP BY 1
ORDER BY bucket_range

The solution has to be single SELECT query (on PostgreSQL 10), not stored procedure or function. It should not impact performance.

I tried the following but they are invalid:

SELECT
  CASE ((table1.col_a + table2.col_b) / 2)
    WHEN < 100 THEN 1
    WHEN < 200 THEN 2

and

SELECT
  CASE ((table1.col_a + table2.col_b) / 2) AS x
    WHEN x < 100 THEN 1
    WHEN x < 200 THEN 2

--- Update note

The comparison evaluation arithmetic < bound_number THEN 1 was just a simplified example. The actual bucket sizes are not consistent, I just updated the question to clarify this. The idea is that the arithmetic expression is repeating across cases.

Upvotes: 0

Views: 360

Answers (4)

Belayer
Belayer

Reputation: 14861

I will present a different approach all together. Since your interval range values are not constant any simple calculation with them will fail. Rather than having a long list of WHEN conditions, you can generate a set of Integer Ranges. You then Join those ranges with your current join to determine which range your calculated value falls within. The big advantage over the case option being the query is virtually impervious to changes in the in the values - just change the array (make sure values are in order).

The following takes your list of break points as an integer array, unrolls it appends a null value to the start and end, then generates a list of ranges within a CTE. The main query joins these ranges with your existing query, and sorts on the range lower bound

with buckets(bucket_range) as 
   (with list (num ) as 
         (select (null) union all 
          select unnest ('{100,200,250,300,800}'::int[]) union all 
          select (null) 
         )
     select  int4range(num, lead(num) over(), '[)')
       from list
   )  --select * from buckets; 
  select (table1.col_a + table2.col_b) / 2 as col_sum, 
         table1.id
  from table1 
  join table2 on table1.column_x = table2.column_y
  join buckets on ((table1.col_a + table2.col_b) / 2) <@ bucket_range
  where  
    <filter conditions>
    and not (lower(bucket_range) is null and upper(bucket_range) is null) 
order by lower(bucket_range) nulls first;

I have created a demonstration. Since you did not provide input data I didn't try to produce your expected results. I just generated something and modified the selected columns to show the relationships. But it does show the query in operation.

Upvotes: 0

forpas
forpas

Reputation: 164089

You can replace the whole CASE expression with:

SELECT CEILING((table1.col_a + table2.col_b) / 2 / 100),
       ..............................

assuming that any of col_a and col_b are real numbers so the division is not integer division.

I they are both integers then divide by 2.0:

SELECT CEILING((table1.col_a + table2.col_b) / 2.0 / 100),
       ..............................

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521053

If I understand the math correctly, you should be able to just divide by 100 and take the floor:

SELECT
    1 + (((table1.col_a + table2.col_b) / 2) / 100) AS bucket_range,
    COUNT(table1.id) AS stats
FROM table1
INNER JOIN table2
    ON table1.column_x = table2.column_y
WHERE
    <filter conditions>
GROUP BY 1
ORDER BY
    bucket_range;

Upvotes: 2

user330315
user330315

Reputation:

You can use a derived table and then refer to the column alias in the outer query:

SELECT CASE
          WHEN col_sum < 100 THEN 1
          WHEN col_sum < 200 THEN 2
          WHEN col_sum < 300 THEN 3
          WHEN col_sum < 400 THEN 4
          WHEN col_sum < 500 THEN 5
          <... till 20>
        END bucket_range,
        COUNT(id) as stats
FROM (
  SELECT (table1.col_a + table2.col_b) / 2 as col_sum, 
         table1.id
  FROM table1 
    JOIN table2 ON table1.column_x = table2.column_y
  WHERE
    <filter conditions>
) d
GROUP BY 1
ORDER BY bucket_range

Unrelated, but: if table1.id is defined as NOT NULL then you can also use count(*) which will be slightly faster than count(id) (if table1.id can contain null values, the two expressions wouldn't be equivalent);


If the buckets are all of the same size, you might want to have a look at the width_bucket() function which achieves a similar thing but without writing a lengthy CASE expression.

Something like:

 width_bucket((table1.col_a + table2.col_b) / 2, 0, 10000, 20)

Upvotes: 0

Related Questions