Ripper7801
Ripper7801

Reputation: 11

TSQL - fill out missing months with totals

My table 'Amount' provides totals per month, but not for every month:

Employee  Reference_no  YearMonth  Amount
1         1             202001     400
1         1             202002     600
1         1             202005     250
1         2             202001     100
1         2             202003     700   

After joining the default 'Calendar' table, I am able to display the missing months:

Employee  Reference_no  YearMonth  Amount
1         1             202001     400
1         1             202002     600
1         1             202003     NULL
1         1             202004     NULL
1         1             202005     250
1         2             202001     100
1         2             202002     NULL
1         2             202003     700 

Unfortunately the totals are missing, while they should be transferred from the last filled month. So, for both 202003 and 202004 (Ref_no 1) the amount should be 600. For 202002 (Ref_no_2) the amount should be 100.

I have been struggling for a long time now, but can't seem to get this working. I found both an OUTER APPLY TOP (1) and SELECT TOP(1) ORDER BY for the Amount field solution, but because of the large size of the dataset the query gets stuck when attempting to run.

Any suggestions are highly appreciated!

Upvotes: 0

Views: 152

Answers (4)

Daniel Gimenez
Daniel Gimenez

Reputation: 20654

All you need to do is to transform the rows from Amount so that they encompass the range in which they're valid and then join them to the calendar table.

The rows will implicitly have the start date in the YearMonth column. To get the cutoff date, you can use the LEAD window function to get the date from the next row in order. If there is no next row, then just add 1 to the current row's YearMonth so that only that row is used.

SELECT amt.Employee, amt.Reference_no, cal.YearMonth, amt.Amount
FROM (
  SELECT *, 
    LEAD(YearMonth, 1, YearMonth + 1)
      OVER (PARTITION BY Employee, Reference_no ORDER BY YearMonth) [Cutoff]
  FROM Amount
) amt
INNER JOIN Calendar cal ON cal.YearMonth >= amt.YearMonth AND cal.YearMonth < amt.Cutoff

Upvotes: 2

ASH
ASH

Reputation: 20362

I think you want to fill in nulls from the prior non-null, right. Try it this way and see if it does what you want.

CREATE TABLE mytest (YearMonth Varchar(10) , Amount int)
INSERT INTO mytest values('202001', 400)
INSERT INTO mytest values('202002', 600)
INSERT INTO mytest values('202003', NULL)
INSERT INTO mytest values('202004', NULL)
INSERT INTO mytest values('202005', 250)
INSERT INTO mytest values('202006', 100)
INSERT INTO mytest values('202007', NULL)
INSERT INTO mytest values('202008', 700)


select * from mytest

enter image description here

SELECT  yearMonth,
        ISNULL(Amount, (SELECT TOP 1 Amount FROM mytest WHERE YearMonth < t.YearMonth AND Amount IS NOT NULL ORDER BY YearMonth DESC))
from mytest t

enter image description here

Upvotes: 0

G B
G B

Reputation: 1462

Note: I have expanded the output slightly to demonstrate the need to handle where an employee has no cumulative total for any preceding month e.g. a new employee starting part way through the year.

The approach below is not as elegant because it does not use a window function, but it may be more intuitive if you are not familiar with window functions.

The first part is just setting up the data. I have included it so that the script stands by itself and you can play with it. skip down to cte_padded_employee_totals for the first statement for the actual answer.

The in-line function (OUTER APPLY) gets the preceding row which has a value for amount for each employee / reference_no pair.


DECLARE @vd_start_calendar_date DATE = '20191101'
DECLARE @vd_end_calendar_date   DATE = '20200801'

DECLARE @t_calendar TABLE
(   first_of_month  DATE,
    year_mth_string CHAR(6)
)

DECLARE @t_cumulative_employee_totals_by_month TABLE
(   employee    INT,
    ref_no      INT,
    year_mth    CHAR(6),
    amount      INT
)

INSERT INTO @t_cumulative_employee_totals_by_month
(   employee    ,
    ref_no      ,
    year_mth    ,
    amount      
)
VALUES
    (1,1,'202001',400),
    (1,1,'202002',600),
    (1,1,'202005',250),
    (1,2,'202001',100),
    (1,2,'202003',700) 

 ;WITH cte_gen_month_data
    AS (SELECT first_of_month = @vd_start_calendar_date
 UNION ALL
        SELECT first_of_month = DATEADD(month, 1, first_of_month)
          FROM cte_gen_month_data
         WHERE first_of_month < @vd_end_calendar_date
       )
INSERT INTO @t_calendar
(      first_of_month,
       year_mth_string
)
SELECT first_of_month,
       CONVERT(CHAR(4), YEAR(first_of_month)) + RIGHT('0' + CONVERT(VARCHAR(2), MONTH(first_of_month)), 2)
  FROM cte_gen_month_data

 -- I have used a common table expression here because we want to refer to the same result set more than once.
;WITH cte_padded_employee_totals
   AS (SELECT ce.*, 
              et.amount
         FROM (SELECT * 
                 FROM @t_calendar c
                 CROSS JOIN (SELECT DISTINCT employee, ref_no
                               FROM @t_cumulative_employee_totals_by_month) e
              ) ce
         LEFT JOIN 
              @t_cumulative_employee_totals_by_month et ON ce.year_mth_string = et.year_mth
                                                       AND ce.employee        = et.employee
                                                       AND ce.ref_no          = et.ref_no
       )
SELECT pt.employee,
       pt.ref_no,
       pt.year_mth_string,
       amount = COALESCE(pt.amount, mt.amount, 0)
  FROM cte_padded_employee_totals pt
 OUTER APPLY 
       (SELECT TOP 1 amount = et.amount
          FROM cte_padded_employee_totals et
         WHERE et.employee = pt.employee
           AND et.ref_no = pt.ref_no
           AND et.first_of_month < pt.first_of_month
           AND et.amount IS NOT NULL
           AND pt.amount IS NULL
         ORDER BY first_of_month DESC
        ) mt -- missing totals

Upvotes: 0

forpas
forpas

Reputation: 164214

Create a CTE with your query and another one which returns for each row the last YearMonth with a non null Amount.
Then join the CTEs:

WITH 
  query AS (............), -- this is your query
  cte AS (
    SELECT *,
      MAX(CASE WHEN [Amount] IS NOT NULL THEN [YearMonth] END) OVER(
        PARTITION BY [Employee], [Reference_no]
        ORDER BY [YearMonth] 
        ROWS UNBOUNDED PRECEDING
      ) AS n
    FROM query  
  )
SELECT c.[Employee], c.[Reference_no], c.[YearMonth],
       COALESCE(c.[Amount], q.[Amount]) [Amount]
FROM cte c LEFT JOIN query q
ON q.[Employee] = c.[Employee] AND q.[Reference_no] = c.[Reference_no] AND q.[YearMonth] = c.n
ORDER BY c.[Employee], c.[Reference_no], c.[YearMonth]

See the demo.

Upvotes: 2

Related Questions