Reputation: 149
Suppose I have the following data
ID | year_month | Department
1233 | 2020-01-01 | A
1123 | 2020-02-01 | A
1123 | 2020-03-01 | NULL
1123 | 2020-04-01 | B
1123 | 2020-05-01 | B
1123 | 2020-06-01 | B
1123 | 2020-07-01 | NULL
9999 | 2020-01-01 | A
9999 | 2020-02-01 | A
9999 | 2020-03-01 | B
9999 | 2020-04-01 | B
9999 | 2020-05-01 | B
9999 | 2020-06-01 | A
9999 | 2020-07-01 | B
I want to identify the changes in department. , including going to NA/NULL. The desired output is:
ID | Change_year_month | Old_Department | New_Department
1123 | 2020-03-01 | A | NULL
1123 | 2020-04-01 | NULL | B
1123 | 2020-07-01 | B | NULL
9999 | 2020-03-01 | A | B
9999 | 2020-06-01 | B | A
Ideas I've already tried to pursue:
with x as(
SELECT T1.ID, T1.Department, MIN(T1.year_month) AS Change_year_month FROM dbo.Source
GROUP BY T1.ID, Department),
y as (
SELECT ID, year_month,
rown = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY year_month) FROM x
)
select y.ID, T2.Department, year_month AS Change_year_month FROM y
right join (SELECT T1.ID,
MAX(Department) as Old_Department,
Min(Department) AS New_Department
FROM dbo.Source
GROUP BY T1.ID HAVING COUNT(DISTINCT(Department)) >= 2) T2 on y.ID = T2.ID
where rown = 1
However, this does not yield the desired result. Whenever a NULL is involved, the query does not see the change. Whenever I change the NULL to something else (like: 'outside the scope'), then the ordering is wrong as the Old_department is never 'outside the scope', but the New_department always is. Also, I feel like the code is inefficient and not durable.
Does anyone have suggestions how to proceed or to construct of durable query?
Upvotes: 0
Views: 36
Reputation: 1270583
Here is a pretty simple method using lag()
:
select s.id, s.year_month, s.prev_department, s.department
from (select s.*,
lag(year_month) over (partition id order by year_month) as prev_ym,
lag(year_month) over (partition id, department order by year_month) as prev_ym_dept,
lag(department) over (partition by id order by year_month) as prev_department
from dbo.source s
) s
where prev_ym_dept <> prev_ym;
This looks at the dates for the comparison, so it just handles NULL
values.
Of course, you can use more complicated comparisons:
select s.id, s.year_month, s.prev_department, s.department
from (select s.*,
lag(year_month) over (partition id order by year_month) as prev_ym,
min(year_month) over (partition by id) as min_year_month
from dbo.source s
) s
where prev_department <> department or
(department is null and
prev_department is not null
) or
(prev_department is null and
department is not null and
year_month <> min_year_month
)
But that is rather tricky to express. And that might even have a mistake in filtering out the first row.
Upvotes: 1