Jason T
Jason T

Reputation: 13

TERADATA Creating Group ID from Rolling SUM Limit

I have a list of products and a count corresponding to the quantity sold in a single table. The data is laid out as such:

Product Name  QTY_SOLD
Mouse            23
Keyboard         25
Monitor          56
TV               10
Laptop           45
...

I want to create a group ID where groups are created if the ROLLING sum of the quantity sold is greater than 50. We can order by Product Name to get an output similar to the following.

Product Name   QTY_SOLD   GROUP_NBR
Keyboard          25          1
Laptop            45          1
Monitor           56          2
Mouse             23          3
TV                10          3

I created a case statement to create the output I need but if I want to change the group id cutoff from 50 to say 100 or if i get more products and quantities I have to keep changing the case statement. Is there an easy way to use either recursion or some other method to accomodate this? This works on Teradata 13.10

UPDATE main FROM prod_list AS main,  
(  
SEL PROD_NAME
, QTY_SOLD
, SUM(QTY_SOLD) OVER (ORDER BY PROD_NAME ROWS UNBOUNDED PRECEDING) RUNNING  FROM prod_list   
) inr 
SET GROUP_NBR = CASE  
WHEN RUNNING < 50 THEN 1
WHEN RUNNING > 50 AND RUNNING < 100 THEN 2
WHEN RUNNING > 100 AND RUNNING < 150 THEN 3
WHEN RUNNING > 150 AND RUNNING < 200 THEN 4
WHEN RUNNING > 200 AND RUNNING < 250 THEN 5
ELSE 6  
END
WHERE main.PROD_NAME = inr.PROD_NAME ;

Upvotes: 1

Views: 2004

Answers (4)

dnoeth
dnoeth

Reputation: 60482

When i first saw your question i thought it was a kind of bin-packing problem. But your query looks like you simply want to put your data into n buckets :-)

Teradata supports the QUANTILE function, but it's deprecated and it doesn't fit your requirements as it creates buckets with equal size. You need WIDTH_BUCKET which creates (as the name implies) buckets of equal width:

SELECT 
    PROD_id
,   COUNT(DISTINCT PROD_ID) AS QTY
,   SUM(QTY) OVER (ORDER BY QTY ROWS UNBOUNDED PRECEDING) RUNNING
,   WIDTH_BUCKET(RUNNING, 1, 120*2000000, 120) AS GRP_NBR
FROM TMP_WORK_DB.PROD_LIST
GROUP BY 1

You can easily change the size of a bucket (2000000) or the number of buckets (120).

Upvotes: 1

Jason T
Jason T

Reputation: 13

This is the code I created on Twelfth's suggestion.

-- create the first entry for the recursive query
INSERT  TMP_WORK_DB.GRP_NBRS VALUES (0,1,0,2000000);


INSERT TMP_WORK_DB.GRP_NBRS (GRP_NBR,LOWER_LIMIT, UPPER_LIMIT)
WITH RECURSIVE GRP_RECRSV (GRP_NBR, LOWER_LIMIT, UPPER_LIMIT) 
AS (
SELECT 
    1 AS GRP_NBR
,   LOWER_LIMIT
,   UPPER_LIMIT
FROM TMP_WORK_DB.GRP_NBRS
UNION ALL
SELECT
    GRP_NBR + 1 
,   LOWER_LIMIT + 2000000 -- set the interval to 2 million
,   UPPER_LIMIT + 2000000 -- can be adjusted as needed
FROM GRP_RECRSV
WHERE GRP_NBR < 120 -- needed a limit so that it would not be endless
)
SELECT * FROM GRP_RECRSV
;


-- delete the first entry because it was duplicated
DELETE FROM TMP_WORK_DB.GRP_NBRS WHERE GRP_NBR = 0;

-- set grp nbr using the limits table
INSERT TMP_WORK_DB.PROD_LIST_GRP
WITH NUMOFPRODS (PROD_NAME,QTY,RUNNING) AS
(
    SELECT 
        PROD_NAME
    ,   COUNT(DISTINCT PROD_ID) AS QTY
    ,   SUM(QTY) OVER (ORDER BY QTY ROWS UNBOUNDED PRECEDING) RUNNING
    FROM TMP_WORK_DB.PROD_LIST
    GROUP BY 1
)
SELECT 
    PROD_NAME
,   QTY
,   RUNNING
,   GRP_NBR
FROM NUMOFPRODS a
JOIN TMP_WORK_DB.GRP_NBRS b ON RUNNING BETWEEN LOWER_LIMIT AND UPPER_LIMIT
;

Upvotes: 0

kishore krv
kishore krv

Reputation: 70

Hope the below logic helps,if its about increments of 50.

UPDATE main FROM prod_list AS main,  
(  
SEL PROD_NAME
, QTY_SOLD
, SUM(QTY_SOLD) OVER (ORDER BY PROD_NAME ROWS UNBOUNDED PRECEDING) RUNNING  FROM prod_list   
) inr 
SET GROUP_NBR = RUNNING /50
WHERE main.PROD_NAME = inr.PROD_NAME ;

Upvotes: 0

Twelfth
Twelfth

Reputation: 7180

Create a reference table and join it...then the change only needs to be done in a table (can even create a procedure to help automate the changes to the table later on)

Psuedo create:

Create table group_nbr (low_limit,upper_limit,group_nbr)

Insert your case values into that table and inner join to it using greater than and less than conditions.

select *, group_nbr.group_nbr
from table inner join group_nbr on RUNNING > lower_limit and RUNNING < upper_limit

Code won't quite work as it sits there, but hopefully you get the idea well enough to alter your code to it. I find leaving these values in reference tables like this far easier than altering code. You can even allow multiple group_nbr setups by adding a 'group_id' to the group_nbr table and having group_id 1 be one set of running limits and group_id of 2,3,4,5 etc having different sets of running limits and use a where clause to choose which group_id you want to use.

Upvotes: 0

Related Questions