Reputation: 13
I understand how to calculate a running total in SQL (DB2) using an analytic function. However, I need to START the running total a) at different points (rows) per group & b) restart the running total every 'policy period'. For instance, if I want to calculate the running total cost per member per month per policy period, for the groups that renew on January 1st of each year, this is straight-forward, as I can partition by year & group_id (& order by month/year) in order to restart the running total per policy period (as the policy period is equivalent to a calendar year).
The challenge begins when I have groups whose policies renew in July of each year, for example, & therefore I need to start (& then re-start) the running total amount in July of each year for that subset of groups.
If I can create a column that segments the policy periods per group, then I can partition by group_id, policy period & order by month/year, I believe. For instance, for a certain group that renews in July of each year, the policy_period column indicates '1' from 201207-201306, '2' from 201307-201406, etc. (This essentially represents the number of policy periods I have as training data for my ML mode.)
Can anyone assist with creating the aforementioned column? I am trying to avoid hard-coding this with numerous case statements, but that may be the route I have to take. Any ideas?
As an aside, I was trying to think outside-the-box & thought that I could potentially transform July to be read as January, but this may also be too time-consuming, as groups can renew their policies in any of the 12-months of the year. I also have a binary column that indicates 1 if the row is a renewal month for a given group, else 0, but this does not assist with the partitioning/ordering in an analytic function to calculate the running total.
Example Table of Current Data:
GRP_ID YR_MO MONTHLY_COST NEXT_RENEWAL_YR_MO
001 201201 10 202007
001 201202 10 202007
001 201203 10 202007
001 201204 10 202007
001 201205 10 202007
001 201206 10 202007
001 201207 10 202007
001 201208 10 202007
001 201209 10 202007
001 201210 10 202007
001 201211 10 202007
001 201212 10 202007
001 201301 10 202007
001 201302 10 202007
001 201303 10 202007
001 201304 10 202007
001 201305 10 202007
001 201306 10 202007
001 201307 10 202007
001 201308 10 202007
Example Table of Desired Results:
GRP_ID YR_MO POLICY_PERIOD MONTHLY_COST RUN_TOT_AMT_PER_PLCY_PRD
001 201201 1 10 10
001 201202 1 10 20
001 201203 1 10 30
001 201204 1 10 40
001 201205 1 10 50
001 201206 1 10 60
001 201207 2 10 10
001 201208 2 10 20
001 201209 2 10 30
001 201210 2 10 40
001 201211 2 10 50
001 201212 2 10 60
001 201301 2 10 70
001 201302 2 10 80
001 201303 2 10 90
001 201304 2 10 100
001 201305 2 10 110
001 201306 2 10 120
001 201307 3 10 10
001 201308 3 10 20
The above is just a single example. Obviously, partitioning by group_id is simple, but how can I a) create the policy period column to then b) start the running total amount per policy period per group?
The policy period column would involve using denserank(), but starting & re-starting on the seventh month of each year for certain groups that renew in July of each year, vs. the third month of each year for other groups renewing in March each year, etc. is the challenge here.
I am thinking along the lines of:
CASE
WHEN RIGHT(NEXT_RENEWAL_YR_MO,2) = RIGHT(YR_MO,2) THEN 1
ELSE 0
END AS RESTART_RUN_TOT_HERE
...but I continue to get stuck on how to get the running total to actually RESTART at each subsequent '1' per group when the rest of the rows indicate '0'.
Upvotes: 0
Views: 67
Reputation: 12454
Try this. If you uncomment the commented out block, you may run the statement as is to get the result you provided.
/*
WITH TAB (GRP_ID, YR_MO, MONTHLY_COST, NEXT_RENEWAL_YR_MO) AS
(
VALUES
('001', '201202', 10, '202007')
, ('001', '201201', 10, '202007')
, ('001', '201203', 10, '202007')
, ('001', '201204', 10, '202007')
, ('001', '201205', 10, '202007')
, ('001', '201206', 10, '202007')
, ('001', '201207', 10, '202007')
, ('001', '201208', 10, '202007')
, ('001', '201209', 10, '202007')
, ('001', '201210', 10, '202007')
, ('001', '201211', 10, '202007')
, ('001', '201212', 10, '202007')
, ('001', '201301', 10, '202007')
, ('001', '201302', 10, '202007')
, ('001', '201303', 10, '202007')
, ('001', '201304', 10, '202007')
, ('001', '201305', 10, '202007')
, ('001', '201306', 10, '202007')
, ('001', '201307', 10, '202007')
, ('001', '201308', 10, '202007')
)
*/
SELECT
GRP_ID, YR_MO
, SUM(CASE WHEN RIGHT(YR_MO, 2) = RIGHT(NEXT_RENEWAL_YR_MO, 2) THEN 1 ELSE 0 END) OVER (PARTITION BY GRP_ID ORDER BY YR_MO) + 1 AS POLICY_PERIOD
, MONTHLY_COST
, SUM(MONTHLY_COST) OVER
(
PARTITION BY
GRP_ID
, SUM(CASE WHEN RIGHT(YR_MO, 2) = RIGHT(NEXT_RENEWAL_YR_MO, 2) THEN 1 ELSE 0 END) OVER (PARTITION BY GRP_ID ORDER BY YR_MO)
ORDER BY YR_MO
) AS RUN_TOT_AMT_PER_PLCY_PRD
FROM TAB;
Upvotes: 1