Reputation: 2244
So, I've got some history tables that have a begin date and an end date. The problem is, that there's several records in this table that refer to the same thing, but their begin dates and end dates are not exact. So, I'm trying to unify their begin and end dates.
So, each set of record will have close begin and end dates (within about 7 seconds). Then there will be another cluster, with the same key (in this case, VoyageID), but a different set of close begin and end dates. Does that make sense? I can post some sample data if it doesn't.
Anyway, my goal right now is to find the minimum begin date for each cluster. What I have now gets me the minimum for each VoyageID. Any help would be appreciated. Thanks!
Here's what I have:
DECLARE @7S DATETIME
SET @7S = '0:0:07'
PRINT @7S
SELECT MAX(T1.BeginDate), T1.VoyageID FROM
hist.VoyageProfitLossValues T1 INNER JOIN
hist.VoyageProfitLossValues T2 ON
T1.VoyageID = T2.VoyageID AND
T1.BeginDate BETWEEN (T2.BeginDate - @7S) and (T2.BeginDate + @7S)
GROUP BY T1.VoyageID
EDIT: Sample data:
BeginDate EndDate VoyageID
2011-07-05 07:02:50.713 2011-07-05 07:25:53.007 6312
2011-07-05 07:02:50.870 2011-07-05 07:25:53.693 6312
2011-07-05 07:02:51.027 2011-07-05 07:25:54.387 6312
2011-07-08 14:22:21.147 NULL 6312
2011-07-08 14:22:21.163 NULL 6312
2011-07-08 14:22:21.177 NULL 6312
Note: The real data has more than 3 per each voyage, and the BeginDates can be further apart.
And I would want out of this:
BeginDate VoyageID
2011-07-05 07:02:50.713 6312
2011-07-08 14:22:21.147 6312
What I have will just give me the first line.
I'll eventually do this with the end date, as well, but I can convert one to the other easily.
Upvotes: 1
Views: 2729
Reputation: 138960
The idea of this solution is to order your rows on BeginDate
for each VoyageID
. Go from the top and pick the rows that have a time diff of more than 7 seconds to the previous row.
@Voy
is instead of hist.VoyageProfitLossValues
. First I create a temp table #T
that will fill the ID
column with ordered values for each VoyageID
. C
is a recursive CTE that starts at ID = 1
and runs through all rows comparing current row with the previous row and storing the result in column FirstDate
. I added a second VoyageID
to the sample data just to prove it works with that as well.
declare @Voy table
(
BeginDate datetime,
EndDate datetime,
VoyageID int
)
insert into @Voy values
('2011-07-05 07:02:50.713', '2011-07-05 07:25:53.007', 6312),
('2011-07-05 07:02:50.870', '2011-07-05 07:25:53.693', 6312),
('2011-07-05 07:02:51.027', '2011-07-05 07:25:54.387', 6312),
('2011-07-08 14:22:21.147', NULL , 6312),
('2011-07-08 14:22:21.163', NULL , 6312),
('2011-07-08 14:22:21.177', NULL , 6312),
('2011-07-05 07:02:50.713', '2011-07-05 07:25:53.007', 6313),
('2011-07-05 07:02:50.870', '2011-07-05 07:25:53.693', 6313),
('2011-07-05 07:02:51.027', '2011-07-05 07:25:54.387', 6313),
('2011-07-08 14:22:21.147', NULL , 6313),
('2011-07-08 14:22:21.163', NULL , 6313),
('2011-07-08 14:22:21.177', NULL , 6313)
create table #T
(
ID int,
VoyageID int,
BeginDate datetime
primary key (ID, VoyageID)
)
insert into #T (ID, VoyageID, BeginDate)
select row_number() over(partition by VoyageID order by BeginDate),
VoyageID,
BeginDate
from @Voy
;with C as
(
select T.ID,
T.VoyageID,
T.BeginDate,
1 as FirstDate
from #T as T
where T.ID = 1
union all
select T.ID,
T.VoyageID,
T.BeginDate,
case when datediff(second, C.BeginDate, T.BeginDate) > 7 then 1 else 0 end
from #T as T
inner join C
on T.ID = C.ID + 1 and
T.VoyageID = C.VoyageID
)
select C.BeginDate,
C.VoyageID
from C
where C.FirstDate = 1
order by C.VoyageID,
C.BeginDate
option (maxrecursion 0)
drop table #T
Result:
BeginDate VoyageID
----------------------- -----------
2011-07-05 07:02:50.713 6312
2011-07-08 14:22:21.147 6312
2011-07-05 07:02:50.713 6313
2011-07-08 14:22:21.147 6313
Upvotes: 2
Reputation: 51655
This approach uses a Cursor. I don't know if it is a valid solution for you:
create table #datacluster (
dateCluster datetime,
dateV datetime primary key)
DECLARE @7S DATETIME
DECLARE @base DATETIME
DECLARE @begindate DATETIME
SELECT @base = SYSDATETIME()
SET @7S = '0:0:07'
DECLARE cursor1 CURSOR
FAST_FORWARD READ_ONLY FOR
SELECT distinct T1.BeginDate
FROM
hist.VoyageProfitLossValues T1
ORDER BY T1.BeginDate DESC
FETCH NEXT FROM cursor1
INTO @begindate;
WHILE @@FETCH_STATUS = 0
BEGIN
IF @base - @7S > @begindate
BEGIN
set @base = @begindate
END
insert into #datacluster ( dateCluster, dateV)
values (@base, @begindate)
FETCH NEXT FROM cursor1
INTO @begindate;
END
Update VoyageProfitLossValues table from #dataCluster:
UPDATE hist.VoyageProfitLossValues
SET BeginDate = (
SELECT C.BeginDate
FROM #datacluster C
WHERE
C.dateV = hist.VoyageProfitLossValues.BeginDate
)
Note 1: Not tested!!
Optimized:
primary key on temporary table. fast forward read only cursor.
Upvotes: 0