Hello World
Hello World

Reputation: 3

How to merge row based on condition in SQL Server

I have table in a SQL Server database table Department and it has 4 columns:

staffNumber department startDate endDate
100 A 2016-09-22 18:30:00.000 2020-02-06 18:29:00.000
100 B 2020-02-06 18:30:00.000 2022-08-21 18:29:00.000
100 A 2022-08-21 18:30:00.000 2079-12-31 18:29:00.000
101 A 2018-06-03 18:30:00.000 2023-08-20 18:29:00.000
101 B 2023-08-20 18:30:00.000 2079-12-31 18:29:00.000
102 A 2022-12-21 18:30:00.000 2023-12-29 18:29:00.000
102 A 2023-12-29 18:30:00.000 2079-12-31 18:29:00.000
103 A 2016-06-22 18:30:00.000 2018-03-05 18:29:00.000
103 A 2018-03-05 18:30:00.000 2021-03-05 18:29:00.000
103 A 2021-03-05 18:30:00.000 2079-12-31 18:29:00.000
104 A 2016-06-22 18:30:00.000 2021-03-05 18:29:00.000
104 A 2021-03-05 18:30:00.000 2079-12-31 18:29:00.000
105 B 2016-08-22 18:30:00.000 2021-02-07 18:29:00.000
105 B 2021-02-09 18:30:00.000 2079-12-31 18:29:00.000

I want to merge more then rows based on specific conditions:

This is my SQL query I tried, but I am not getting the expected output:

WITH DeptData_CTE AS 
(
    SELECT 
        staffNumber,
        department,
        startDate,
        endDate
        LEAD(startDate) OVER (PARTITION BY staffNumber, department ORDER BY startDate) AS nextstartDate,
        LAG(endDate) OVER (PARTITION BY staffNumber, department ORDER BY startDate) AS prevendDate
    FROM 
        department
),
MergedData_CTE AS 
(
    SELECT 
        staffNumber,
        department,
        MIN(startDate) AS startDate,
        MAX(endDate) AS endDate
    FROM 
        DeptData_CTE
    -- We merge rows where previous expiry date and current effective date are consecutive
    WHERE 
        prevendDate IS NULL 
        OR DATEDIFF(day, prevendDate, startDate) != 0
    GROUP BY 
        staffNumber, 
        department
)
-- Final selection of rows
SELECT 
    staffNumber,
    department,
    startDate,
    endDate
FROM 
    MergedData_CTE
ORDER BY 
    staffNumber, 
    department, 
    startDate;

I am getting expected startDate with merged rows with two consecutive rows, but not getting expected endDate.

My expected output would be:

staffNumber department startDate endDate
100 A 2016-09-22 18:30:00.000 2020-02-06 18:29:00.000
100 B 2020-02-06 18:30:00.000 2022-08-21 18:29:00.000
100 A 2022-08-21 18:30:00.000 2079-12-31 18:29:00.000
101 A 2018-06-03 18:30:00.000 2023-08-20 18:29:00.000
101 B 2023-08-20 18:30:00.000 2079-12-31 18:29:00.000
102 A 2022-12-21 18:30:00.000 2079-12-31 18:29:00.000
103 A 2016-06-22 18:30:00.000 2079-12-31 18:29:00.000
104 A 2016-06-22 18:30:00.000 2079-12-31 18:29:00.000
105 B 2016-08-22 18:30:00.000 2021-02-07 18:29:00.000
105 B 2021-02-09 18:30:00.000 2079-12-31 18:29:00.000

For staff numer 105, the row is not merged because date difference is >= 1 day.

I do not want to use a stored procedure. I just want to use SELECT SQL query

Upvotes: 0

Views: 62

Answers (1)

ValNik
ValNik

Reputation: 5926

This is gaps and islands task.

  1. We found gaps:
  • previous department<>current department (if this is first row we take default for lag() - current department.
  • days between previous endDate and current startDate >0
  1. Assign number for gaps. This is sum(isGap)over(partition by staffNumber order by startDate) gn.

  2. Then merge islands by group by.

See example

with DeptData as(
select *
  ,case when department<>lag(department,1,department)
              over(partition by staffNumber order by startDate) then 1
        when datediff(day,lag(endDate,1,startDate)over(partition by staffNumber order by startDate)
                         ,startDate)>0  then 1
      else 0
      end isGap
   from department
)
,gaps as(
  select * 
    ,sum(isGap)over(partition by staffNumber order by startDate) gn
  from DeptData
)
select staffNumber,department
  ,min(startDate)startDate,max(endDate)endDate
from gaps
group by staffNumber,department,gn
order by staffNumber,startDate,department
staffNumber department startDate endDate
100 A 2016-09-22 18:30:00.000 2020-02-06 18:29:00.000
100 B 2020-02-06 18:30:00.000 2022-08-21 18:29:00.000
100 A 2022-08-21 18:30:00.000 2079-12-31 18:29:00.000
101 A 2018-06-03 18:30:00.000 2023-08-20 18:29:00.000
101 B 2023-08-20 18:30:00.000 2079-12-31 18:29:00.000
102 A 2022-12-21 18:30:00.000 2079-12-31 18:29:00.000
103 A 2016-06-22 18:30:00.000 2079-12-31 18:29:00.000
104 A 2016-06-22 18:30:00.000 2079-12-31 18:29:00.000
105 B 2016-08-22 18:30:00.000 2021-02-07 18:29:00.000
105 B 2021-02-09 18:30:00.000 2079-12-31 18:29:00.000

fiddle

Before group by we get this result.
There you see gaps (isGap) and islands (gn) particularily for stuffNumber.

staffNumber department startDate endDate isGap gn
100 A 2016-09-22 18:30:00.000 2020-02-06 18:29:00.000 0 0
100 B 2020-02-06 18:30:00.000 2022-08-21 18:29:00.000 1 1
100 A 2022-08-21 18:30:00.000 2079-12-31 18:29:00.000 1 2
101 A 2018-06-03 18:30:00.000 2023-08-20 18:29:00.000 0 0
101 B 2023-08-20 18:30:00.000 2079-12-31 18:29:00.000 1 1
102 A 2022-12-21 18:30:00.000 2023-12-29 18:29:00.000 0 0
102 A 2023-12-29 18:30:00.000 2079-12-31 18:29:00.000 0 0
103 A 2016-06-22 18:30:00.000 2018-03-05 18:29:00.000 0 0
103 A 2018-03-05 18:30:00.000 2021-03-05 18:29:00.000 0 0
103 A 2021-03-05 18:30:00.000 2079-12-31 18:29:00.000 0 0
104 A 2016-06-22 18:30:00.000 2021-03-05 18:29:00.000 0 0
104 A 2021-03-05 18:30:00.000 2079-12-31 18:29:00.000 0 0
105 B 2016-08-22 18:30:00.000 2021-02-07 18:29:00.000 0 0
105 B 2021-02-09 18:30:00.000 2079-12-31 18:29:00.000 1 1

We can use partition by stuffNumber,department. Then gap is when days between previous endDate and current startDate >0. See example 2 in fiddle

Upvotes: 1

Related Questions