Reputation: 11
My table 'Amount' provides totals per month, but not for every month:
Employee Reference_no YearMonth Amount
1 1 202001 400
1 1 202002 600
1 1 202005 250
1 2 202001 100
1 2 202003 700
After joining the default 'Calendar' table, I am able to display the missing months:
Employee Reference_no YearMonth Amount
1 1 202001 400
1 1 202002 600
1 1 202003 NULL
1 1 202004 NULL
1 1 202005 250
1 2 202001 100
1 2 202002 NULL
1 2 202003 700
Unfortunately the totals are missing, while they should be transferred from the last filled month. So, for both 202003 and 202004 (Ref_no 1) the amount should be 600. For 202002 (Ref_no_2) the amount should be 100.
I have been struggling for a long time now, but can't seem to get this working. I found both an OUTER APPLY TOP (1) and SELECT TOP(1) ORDER BY for the Amount field solution, but because of the large size of the dataset the query gets stuck when attempting to run.
Any suggestions are highly appreciated!
Upvotes: 0
Views: 152
Reputation: 20654
All you need to do is to transform the rows from Amount so that they encompass the range in which they're valid and then join them to the calendar table.
The rows will implicitly have the start date in the YearMonth column. To get the cutoff date, you can use the LEAD window function to get the date from the next row in order. If there is no next row, then just add 1 to the current row's YearMonth so that only that row is used.
SELECT amt.Employee, amt.Reference_no, cal.YearMonth, amt.Amount
FROM (
SELECT *,
LEAD(YearMonth, 1, YearMonth + 1)
OVER (PARTITION BY Employee, Reference_no ORDER BY YearMonth) [Cutoff]
FROM Amount
) amt
INNER JOIN Calendar cal ON cal.YearMonth >= amt.YearMonth AND cal.YearMonth < amt.Cutoff
Upvotes: 2
Reputation: 20362
I think you want to fill in nulls from the prior non-null, right. Try it this way and see if it does what you want.
CREATE TABLE mytest (YearMonth Varchar(10) , Amount int)
INSERT INTO mytest values('202001', 400)
INSERT INTO mytest values('202002', 600)
INSERT INTO mytest values('202003', NULL)
INSERT INTO mytest values('202004', NULL)
INSERT INTO mytest values('202005', 250)
INSERT INTO mytest values('202006', 100)
INSERT INTO mytest values('202007', NULL)
INSERT INTO mytest values('202008', 700)
select * from mytest
SELECT yearMonth,
ISNULL(Amount, (SELECT TOP 1 Amount FROM mytest WHERE YearMonth < t.YearMonth AND Amount IS NOT NULL ORDER BY YearMonth DESC))
from mytest t
Upvotes: 0
Reputation: 1462
Note: I have expanded the output slightly to demonstrate the need to handle where an employee has no cumulative total for any preceding month e.g. a new employee starting part way through the year.
The approach below is not as elegant because it does not use a window function, but it may be more intuitive if you are not familiar with window functions.
The first part is just setting up the data. I have included it so that the script stands by itself and you can play with it. skip down to cte_padded_employee_totals
for the first statement for the actual answer.
The in-line function (OUTER APPLY
) gets the preceding row which has a value for amount for each employee / reference_no pair.
DECLARE @vd_start_calendar_date DATE = '20191101'
DECLARE @vd_end_calendar_date DATE = '20200801'
DECLARE @t_calendar TABLE
( first_of_month DATE,
year_mth_string CHAR(6)
)
DECLARE @t_cumulative_employee_totals_by_month TABLE
( employee INT,
ref_no INT,
year_mth CHAR(6),
amount INT
)
INSERT INTO @t_cumulative_employee_totals_by_month
( employee ,
ref_no ,
year_mth ,
amount
)
VALUES
(1,1,'202001',400),
(1,1,'202002',600),
(1,1,'202005',250),
(1,2,'202001',100),
(1,2,'202003',700)
;WITH cte_gen_month_data
AS (SELECT first_of_month = @vd_start_calendar_date
UNION ALL
SELECT first_of_month = DATEADD(month, 1, first_of_month)
FROM cte_gen_month_data
WHERE first_of_month < @vd_end_calendar_date
)
INSERT INTO @t_calendar
( first_of_month,
year_mth_string
)
SELECT first_of_month,
CONVERT(CHAR(4), YEAR(first_of_month)) + RIGHT('0' + CONVERT(VARCHAR(2), MONTH(first_of_month)), 2)
FROM cte_gen_month_data
-- I have used a common table expression here because we want to refer to the same result set more than once.
;WITH cte_padded_employee_totals
AS (SELECT ce.*,
et.amount
FROM (SELECT *
FROM @t_calendar c
CROSS JOIN (SELECT DISTINCT employee, ref_no
FROM @t_cumulative_employee_totals_by_month) e
) ce
LEFT JOIN
@t_cumulative_employee_totals_by_month et ON ce.year_mth_string = et.year_mth
AND ce.employee = et.employee
AND ce.ref_no = et.ref_no
)
SELECT pt.employee,
pt.ref_no,
pt.year_mth_string,
amount = COALESCE(pt.amount, mt.amount, 0)
FROM cte_padded_employee_totals pt
OUTER APPLY
(SELECT TOP 1 amount = et.amount
FROM cte_padded_employee_totals et
WHERE et.employee = pt.employee
AND et.ref_no = pt.ref_no
AND et.first_of_month < pt.first_of_month
AND et.amount IS NOT NULL
AND pt.amount IS NULL
ORDER BY first_of_month DESC
) mt -- missing totals
Upvotes: 0
Reputation: 164214
Create a CTE with your query and another one which returns for each row the last YearMonth
with a non null Amount
.
Then join the CTEs:
WITH
query AS (............), -- this is your query
cte AS (
SELECT *,
MAX(CASE WHEN [Amount] IS NOT NULL THEN [YearMonth] END) OVER(
PARTITION BY [Employee], [Reference_no]
ORDER BY [YearMonth]
ROWS UNBOUNDED PRECEDING
) AS n
FROM query
)
SELECT c.[Employee], c.[Reference_no], c.[YearMonth],
COALESCE(c.[Amount], q.[Amount]) [Amount]
FROM cte c LEFT JOIN query q
ON q.[Employee] = c.[Employee] AND q.[Reference_no] = c.[Reference_no] AND q.[YearMonth] = c.n
ORDER BY c.[Employee], c.[Reference_no], c.[YearMonth]
See the demo.
Upvotes: 2