Cathal Clavie
Cathal Clavie

Reputation: 17

Running sum with missing dates

I have a table of data like this:

PeriodYearMonth Reg      PartNo             ComponentRemovals   RunningRemovals
2019 10         G-NHVP   109-0740V01-137    1                    1
2019 11         G-NHVP   109-0740V01-137    1                    2  
2019 12         G-NHVP   109-0740V01-137    1                    3
2020 01         G-NHVP   109-0740V01-137    1                    4
2019 10         OO-NSF   11-13354P          1                    1
2019 09         G-NHVR   11-13354P          2                    2 
2019 10         OY-HMV   11-13354P          1                    1

In the last column I calculated the 7-month running sum of removals per PartNo and per Reg. To do this I made the following code:

/****** Find Running Component Removals ******/
SELECT  [PeriodYearMonth]
      ,[Reg]
      ,[PartNo]
      ,[ComponentRemovals]
      ,sum(sum(ComponentRemovals)) over (Partition by [Reg], PartNo Order By PeriodYearMonth, PeriodYearMonth rows between 6 preceding and current row) as RunningRemovals
      ,[ConfirmedFailures]
      , sum(sum(ConfirmedFailures)) over (Partition by [Reg], PartNo Order By PeriodYearMonth, PeriodYearMonth rows between 6 preceding and current row) as RunningFailures

  FROM [RALNHVTST].[dbo].[vtRelRepComponentsRemovalsByPartNo]

  Group by Reg, PartNo, ComponentRemovals, ConfirmedFailures, PeriodYearMonth
  Order by PartNo

However, since not all months are included in the PeriodYearMonth column the results are incorrect. I've seen solutions online for simple cases, but the tricky part for me is, that I need an entry for every PartNo, every Reg and every month.

Any help would be greatly appreciated.

Regards

Upvotes: 1

Views: 2208

Answers (2)

Nick
Nick

Reputation: 147266

To solve this problem you need to create separate derived tables of all the PeriodYearMonth values (generated using a recursive CTE) and the distinct Reg and PartNo pairs, and CROSS JOIN them to each other to get all combinations of the columns. This combined table can then be LEFT JOINed to the original table to get the ComponentRemovals for each Reg, PartNo and PeriodYearMonth, and these can then be summed using a window function:

WITH CTE AS (
  SELECT CONVERT(DATE, CONCAT(REPLACE(MIN(PeriodYearMonth), ' ', '-'), '-01'), 23)  AS date, 
         CONVERT(DATE, CONCAT(REPLACE(MAX(PeriodYearMonth), ' ', '-'), '-01'), 23)  AS max_date
  FROM vtRelRepComponentsRemovalsByPartNo
  UNION ALL
  SELECT DATEADD(MONTH, 1, date), max_date
  FROM CTE
  WHERE date < max_date
)
SELECT FORMAT(CTE.date, 'yyyy MM') AS PeriodYearMonth
     , rp.Reg
     , rp.PartNo
     , COALESCE(v.ComponentRemovals, 0) AS ComponentRemovals
     , SUM(COALESCE(v.ComponentRemovals, 0)) OVER (PARTITION BY rp.Reg, rp.PartNo ORDER BY FORMAT(CTE.date, 'yyyy MM')
                                                   ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS RunningRemovals
FROM CTE
CROSS JOIN (SELECT DISTINCT Reg, PartNo FROM vtRelRepComponentsRemovalsByPartNo) rp
LEFT JOIN vtRelRepComponentsRemovalsByPartNo v
       ON v.PeriodYearMonth = FORMAT(CTE.date, 'yyyy MM')
      AND v.Reg = rp.Reg
      AND v.PartNo = rp.PartNo
ORDER BY rp.Reg, rp.PartNo, CTE.date

Output

PeriodYearMonth     Reg     PartNo              ComponentRemovals   RunningRemovals
2019 09             G-NHVP  109-0740V01-137     0                   0
2019 10             G-NHVP  109-0740V01-137     1                   1
2019 11             G-NHVP  109-0740V01-137     1                   2
2019 12             G-NHVP  109-0740V01-137     1                   3
2020 01             G-NHVP  109-0740V01-137     1                   4
2019 09             G-NHVR  11-13354P           2                   2
2019 10             G-NHVR  11-13354P           0                   2
2019 11             G-NHVR  11-13354P           0                   2
2019 12             G-NHVR  11-13354P           0                   2
2020 01             G-NHVR  11-13354P           0                   2
2019 09             OO-NSF  11-13354P           0                   0
2019 10             OO-NSF  11-13354P           1                   1
2019 11             OO-NSF  11-13354P           0                   1
2019 12             OO-NSF  11-13354P           0                   1
2020 01             OO-NSF  11-13354P           0                   1
2019 09             OY-HMV  11-13354P           0                   0
2019 10             OY-HMV  11-13354P           1                   1
2019 11             OY-HMV  11-13354P           0                   1
2019 12             OY-HMV  11-13354P           0                   1
2020 01             OY-HMV  11-13354P           0                   1

Demo on SQLFiddle

Note that if all PeriodYearMonth values of interest are present in vtRelRepComponentsRemovalsByPartNo you can simply use a SELECT DISTINCT ... subquery to get those values rather than the recursive CTE e.g.

SELECT d.PeriodYearMonth
     , rp.Reg
     , rp.PartNo
     , COALESCE(v.ComponentRemovals, 0) AS ComponentRemovals
     , SUM(COALESCE(v.ComponentRemovals, 0)) OVER (PARTITION BY rp.Reg, rp.PartNo ORDER BY d.PeriodYearMonth
                                                   ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS RunningRemovals
FROM (SELECT DISTINCT PeriodYearMonth FROM vtRelRepComponentsRemovalsByPartNo) d
CROSS JOIN (SELECT DISTINCT Reg, PartNo FROM vtRelRepComponentsRemovalsByPartNo) rp
LEFT JOIN vtRelRepComponentsRemovalsByPartNo v
       ON v.PeriodYearMonth = d.PeriodYearMonth
      AND v.Reg = rp.Reg
      AND v.PartNo = rp.PartNo
ORDER BY rp.Reg, rp.PartNo, d.PeriodYearMonth

Demo on SQLFiddle

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1271151

In the last column I calculated the running sum of removals per PartNo and per Reg over the past 7 months.

Removing the windowing clause and filter:

select [PeriodYearMonth], [Reg], [PartNo], [ComponentRemovals],
        sum(sum(ComponentRemovals)) over (Partition by [Reg], PartNo Order By PeriodYearMonth, PeriodYearMonth) as RunningRemovals
        [ConfirmedFailures],
        sum(sum(ConfirmedFailures)) over (Partition by [Reg], PartNo Order By PeriodYearMonth, PeriodYearMont) as RunningFailures
from [RALNHVTST].[dbo].[vtRelRepComponentsRemovalsByPartNo]
where PeriodYearMonth >= format(dateadd(month, -7, getdate()), 'yyyy MM')
group by Reg, PartNo, ComponentRemovals, ConfirmedFailures, PeriodYearMonth
Order by PartNo

Upvotes: 0

Related Questions