Monty_Brea
Monty_Brea

Reputation: 5

Bucketing Values using Group By Not Working?

I have a table that looks like this:

Field A | Field B | Field C | Field D | Field E | Field F
100001  |  10.00  | 2.00   | -1.00. | 7.00   | "0-9.99
100002  |  8.00   | 1.00   | NULL   | 7.00   | "1-9.99"
100003  |  40.00  | 2.00.  | NULL   | 38.00  | "30.00-39.99"
100004  |  20.00  | 1.00.  | NULL   | 19.00  | "10 - 19.99"
100005  |  30.00  | 11.00  | NULL   | 19.00. | "10 - 19.99"

Field E is a sum of fields B, C, and D - I've got that working and here's my code

Now, I want to create a calculated field Field F which buckets the values in field E based on their value. For example, row one would be in 0-9.99 but I have no idea how to do this.

Ideally, I'd like to do this ALL in one query. And the particular syntax is PostgreSQL but this particular application doesn't play nice. Even SELECT * FROM Table1; throws an error. It's likely due to my relative inexperience with this syntax but maybe you guys could tell me how to fix this query. Here's what I tried so far

  SELECT "Field A,"Field B","Field C","Field D",
  (coalesce("FIELD B", 0) - coalesce("Field C", 0) + 
  coalesce("Field D", 0)) AS "Field E" 
  From "Table1";SELECT "Field E",CASE WHEN “Field D” >= 0 and 
  “FIELD D” <= 9.99 then “0-9.99”
     WHEN “FIELD D” >= 10 and “FIELD D” <= 19.99 then “10 to 19.99”
     WHEN “Field D” >= 20 and “Field D” <= 29.99 then “20 to 29.99”
     WHEN “Field D” >= 30 and “Field D” <= 39.99 then “30 to 39.99”
     WHEN “Field D” >= 40 and “Field D” <= 49.99 then “40 to 49.99”
     ELSE “$50+”
     END FROM “Orders”;

I literally just want a column where the buckets in Field F are chosen based on the value of field E (another calculated field). All my normal queries are not working in this particular application and I'm just trying to see if there is something stupid I'm missing. Thank you.

Upvotes: 0

Views: 54

Answers (4)

unutbu
unutbu

Reputation: 880279

You could use width_bucket to find which bucket field_e fits into, thus avoiding the need to write out a possibly lengthy CASE statement:

WITH tmp AS (
    SELECT * FROM (VALUES 
    (100001  ,  10.00  , 2.00  , -1.00 ),
    (100002  ,  8.00   , 1.00  , NULL  ),
    (100003  ,  40.00  , 2.00  , NULL  ),
    (100004  ,  20.00  , 1.00  , NULL  ),
    (100005  ,  30.00  , 11.00 , NULL  )
    ) foo (field_a , field_b , field_c , field_d)
    ), cutoffs AS (
    SELECT idx, cutoff::text || '-' || COALESCE(next_cutoff::text, '+') AS label
    FROM (
        SELECT 
            ROW_NUMBER() OVER (ORDER BY cutoff) AS idx, 
            cutoff, 
            LEAD(cutoff) OVER (ORDER BY cutoff) - 0.01 AS next_cutoff
        FROM generate_series(0, 50, 10) AS cutoff) t )
SELECT field_a , field_b , field_c , field_d, field_e, label AS field_f
FROM (
    SELECT *, width_bucket(field_e, 0, 50, 5) AS idx
    FROM (
        SELECT *, COALESCE(field_b, 0) - COALESCE(field_c, 0) + COALESCE(field_d, 0) AS field_e
        FROM tmp) t1
    ) t2
INNER JOIN cutoffs
USING (idx)
ORDER BY field_a

yields

| field_a | field_b | field_c | field_d | field_e |  field_f |
|---------+---------+---------+---------+---------+----------|
|  100001 |   10.00 |    2.00 |   -1.00 |    7.00 |   0-9.99 |
|  100002 |    8.00 |    1.00 |         |    7.00 |   0-9.99 |
|  100003 |   40.00 |    2.00 |         |   38.00 | 30-39.99 |
|  100004 |   20.00 |    1.00 |         |   19.00 | 10-19.99 |
|  100005 |   30.00 |   11.00 |         |   19.00 | 10-19.99 |

Upvotes: 0

forpas
forpas

Reputation: 164139

Start from 50 and then move to lower values:

case 
  when fieldE >= 50 then '$50+'
  when fieldE >= 40 then '40-49.99'
  when fieldE >= 30 then '30-39.99'
  when fieldE >= 20 then '20-29.99'
  when fieldE >= 10 then '10-19.99'
  when fieldE >= 0 then '0-9.99'
  else ''
end

Upvotes: 1

mnesarco
mnesarco

Reputation: 2798

Yes, you have syntax errors, and also your case statement is based on D and not on E. Try this:

WITH tmp AS (
    SELECT 
        "Field A",
        "Field B",
        "Field C",
        "Field D",
        (coalesce("FIELD B", 0) - coalesce("Field C", 0) +  coalesce("Field D", 0)) AS "Field E"
    FROM "Table1"
)
SELECT 
    tmp.*, 
    CASE 
        WHEN "Field E" >=  0 and "FIELD E" <=  9.99 THEN "0-9.99"
        WHEN "FIELD E" >= 10 and "FIELD E" <= 19.99 THEN "10 to 19.99"
        WHEN "Field E" >= 20 and "Field E" <= 29.99 THEN "20 to 29.99"
        WHEN "Field E" >= 30 and "Field E" <= 39.99 THEN "30 to 39.99"
        WHEN "Field E" >= 40 and "Field E" <= 49.99 THEN "40 to 49.99"
        ELSE "$50+"
    END AS "Field F"
FROM 
    tmp

Upvotes: 0

DRapp
DRapp

Reputation: 48159

First, column names with spaces is almost ALWAYS a bad idea, even if doing sample data. Some sql-engines prefer 'single-quote' around column names vs "double".

You could nest your queries to get all the pre-calcs first, THEN do your case/when into buckets. The inner query is from your Table1 (t1 alias) that applies the computation. That result becomes alias "PQ" (prequery) and allows your case/when to be simplified by the now available "FieldE" column.

select
      PQ.FieldA,
      PQ.FieldB,
      PQ.FieldC,
      PQ.FieldD,
      PQ.FieldE,
      case when PQ.FieldE < 10 then '0 - 9.99'
           when PQ.FieldE >= 10 AND PQ.FieldE < 20 then '10 to 19.99'
           when PQ.FieldE >= 20 and PQ.FieldE < 30 then '20 to 29.99'
           when PQ.FieldE >= 30 and PQ.FieldE < 40 then '30 to 39.99'
           when PQ.FieldE >= 40 and PQ.FieldE < 50 then '40 to 49.99'
           else '$50+' end CalcBucket
   from
      ( SELECT
              T1.FieldA,
              T1.FieldB,
              T1.FieldC,
              T1.FieldD,
              coalesce( T1.FieldB, 0) 
               - coalesce(T1.FieldC, 0) 
               + coalesce(T1.FieldD, 0)) AS FieldE 
           From 
              T1 )  PQ

Upvotes: 0

Related Questions