Reputation: 116790
I have a table like this:
CREATE TABLE #TEMP (Name VARCHAR(255), START_TIME datetime, END_TIME datetime);
INSERT INTO #TEMP VALUES('John', '2012-01-01 09:00:01', '2012-01-01 12:00:02')
INSERT INTO #TEMP VALUES('John', '2012-01-01 09:40:01', '2012-01-01 11:00:02')
INSERT INTO #TEMP VALUES('John', '2012-01-02 05:00:01', '2012-01-02 05:15:02')
INSERT INTO #TEMP VALUES('David', '2012-01-04 05:00:01', '2012-01-04 05:15:02')
INSERT INTO #TEMP VALUES('David', '2012-01-05 07:01:01', '2012-01-05 15:15:02')
SELECT *
FROM #TEMP
DROP TABLE #TEMP
And the data is:
Name START_TIME END_TIME
1 John 2012-01-01 09:00:01.000 2012-01-01 12:00:02.000
2 John 2012-01-01 09:40:01.000 2012-01-01 11:00:02.000
3 John 2012-01-02 05:00:01.000 2012-01-02 05:15:02.000
4 David 2012-01-04 05:00:01.000 2012-01-04 05:15:02.000
5 David 2012-01-05 07:01:01.000 2012-01-05 08:15:02.000
Given a number say, 6, I am trying to do a GROUP BY
on this table and merge times that overlap within a window of 6 hours before and after. Therefore, in the above table, rows 1
and 2
would be merged into a single row as they contain overlapping time range:
John 2012-01-01 06:00:01.000 2012-01-01 18:00:02.000
Rows 4
and 5
will be merged because subtracting 6 hours from 07:01:01.000
falls into the window of row 4
.
Is there a good way of doing this on a large table containing about a million rows?
Upvotes: 4
Views: 288
Reputation: 51645
I think that the best way to do this is creating a windows
table and join #temp table with this new window table:
1) Step 1, preparing window table with all possible windows gaps (contains overlaping windows):
SELECT
Name,
dateadd(hour, -6, start_time) as start_w,
dateadd(hour, +6, start_time) as end_w
into #possible_windows
FROM #TEMP
2) Create an index on temp table to improve performance
create index pw_idx on #possible_windows ( Name, start_w)
3) Eliminate overlaping windows in a self join select. This is the reason to create the index:
select p2.*
into #myWindows
from #possible_windows p1
right outer join #possible_windows p2
on p1.name = p2.name and
p2.start_w > p1.start_W and p2.start_w <= p1.end_w
where p1.name is null
4) Join your table with #myWindows or use it directly.
WORKING:
SELECT
Name,
dateadd(hour, -6, start_time) as start_w,
dateadd(hour, +6, start_time) as end_w,
ROW_NUMBER() over(partition by Name order by Name,
dateadd(hour, -6, start_time) ) as rn
into #possible_windows
FROM #TEMP
create index pw_idx on #possible_windows ( Name, start_w)
select p2.*
from #possible_windows p1
right outer join #possible_windows p2
on p1.name = p2.name and
p2.start_w > p1.start_W and p2.start_w <= p1.end_w
where p1.name is null
RESULTS:
Name start_w end_w rn
----- ------------- ------------- --
David 2012-01-03 23:00:012012-01-04 11:00:011
David 2012-01-05 01:01:012012-01-05 13:01:012
John 2012-01-01 03:00:012012-01-01 15:00:011
John 2012-01-01 23:00:012012-01-02 11:00:013
PE: Please, go back with your performance tests!
Upvotes: 2