Colin DeClue
Colin DeClue

Reputation: 2244

Find the minimum date in a range of dates with sql

So, I've got some history tables that have a begin date and an end date. The problem is, that there's several records in this table that refer to the same thing, but their begin dates and end dates are not exact. So, I'm trying to unify their begin and end dates.

So, each set of record will have close begin and end dates (within about 7 seconds). Then there will be another cluster, with the same key (in this case, VoyageID), but a different set of close begin and end dates. Does that make sense? I can post some sample data if it doesn't.

Anyway, my goal right now is to find the minimum begin date for each cluster. What I have now gets me the minimum for each VoyageID. Any help would be appreciated. Thanks!

Here's what I have:

DECLARE @7S DATETIME
SET @7S = '0:0:07'

PRINT @7S

SELECT MAX(T1.BeginDate), T1.VoyageID FROM
hist.VoyageProfitLossValues T1 INNER JOIN
hist.VoyageProfitLossValues T2 ON
T1.VoyageID = T2.VoyageID AND
T1.BeginDate BETWEEN (T2.BeginDate - @7S) and (T2.BeginDate + @7S)
GROUP BY T1.VoyageID

EDIT: Sample data:

BeginDate                   EndDate                    VoyageID
2011-07-05 07:02:50.713     2011-07-05 07:25:53.007    6312
2011-07-05 07:02:50.870     2011-07-05 07:25:53.693    6312
2011-07-05 07:02:51.027     2011-07-05 07:25:54.387    6312
2011-07-08 14:22:21.147     NULL                       6312
2011-07-08 14:22:21.163     NULL                       6312
2011-07-08 14:22:21.177     NULL                       6312

Note: The real data has more than 3 per each voyage, and the BeginDates can be further apart.

And I would want out of this:

BeginDate                   VoyageID
2011-07-05 07:02:50.713     6312
2011-07-08 14:22:21.147     6312

What I have will just give me the first line.

I'll eventually do this with the end date, as well, but I can convert one to the other easily.

Upvotes: 1

Views: 2729

Answers (2)

Mikael Eriksson
Mikael Eriksson

Reputation: 138960

The idea of this solution is to order your rows on BeginDate for each VoyageID. Go from the top and pick the rows that have a time diff of more than 7 seconds to the previous row.

@Voy is instead of hist.VoyageProfitLossValues. First I create a temp table #T that will fill the ID column with ordered values for each VoyageID. C is a recursive CTE that starts at ID = 1 and runs through all rows comparing current row with the previous row and storing the result in column FirstDate. I added a second VoyageID to the sample data just to prove it works with that as well.

declare @Voy table
(
  BeginDate datetime,
  EndDate datetime,
  VoyageID int
)

insert into @Voy values  
('2011-07-05 07:02:50.713',     '2011-07-05 07:25:53.007',    6312),
('2011-07-05 07:02:50.870',     '2011-07-05 07:25:53.693',    6312),
('2011-07-05 07:02:51.027',     '2011-07-05 07:25:54.387',    6312),
('2011-07-08 14:22:21.147',      NULL                    ,    6312),
('2011-07-08 14:22:21.163',      NULL                    ,    6312),
('2011-07-08 14:22:21.177',      NULL                    ,    6312),
('2011-07-05 07:02:50.713',     '2011-07-05 07:25:53.007',    6313),
('2011-07-05 07:02:50.870',     '2011-07-05 07:25:53.693',    6313),
('2011-07-05 07:02:51.027',     '2011-07-05 07:25:54.387',    6313),
('2011-07-08 14:22:21.147',      NULL                    ,    6313),
('2011-07-08 14:22:21.163',      NULL                    ,    6313),
('2011-07-08 14:22:21.177',      NULL                    ,    6313)


create table #T
(
  ID int,
  VoyageID int,
  BeginDate datetime
  primary key (ID, VoyageID)
)

insert into #T (ID, VoyageID, BeginDate)
select row_number() over(partition by VoyageID order by BeginDate),
       VoyageID,
       BeginDate
from @Voy     


;with C as
(
  select T.ID,
         T.VoyageID,
         T.BeginDate,
         1 as FirstDate
  from #T as T
  where T.ID = 1
  union all
  select T.ID,
         T.VoyageID,
         T.BeginDate,
         case when datediff(second, C.BeginDate, T.BeginDate) > 7 then 1 else 0 end
  from #T as T
    inner join C
      on T.ID = C.ID + 1 and
         T.VoyageID = C.VoyageID
)
select C.BeginDate,
       C.VoyageID
from C
where C.FirstDate = 1
order by C.VoyageID,
         C.BeginDate
option (maxrecursion 0)


drop table #T

Result:

BeginDate               VoyageID
----------------------- -----------
2011-07-05 07:02:50.713 6312
2011-07-08 14:22:21.147 6312
2011-07-05 07:02:50.713 6313
2011-07-08 14:22:21.147 6313

Upvotes: 2

dani herrera
dani herrera

Reputation: 51655

This approach uses a Cursor. I don't know if it is a valid solution for you:

create table #datacluster ( 
    dateCluster datetime, 
    dateV datetime primary key)

DECLARE @7S DATETIME
DECLARE @base DATETIME
DECLARE @begindate DATETIME

SELECT @base = SYSDATETIME()
SET @7S = '0:0:07'

DECLARE cursor1 CURSOR 
FAST_FORWARD READ_ONLY FOR    
SELECT distinct T1.BeginDate 
FROM
  hist.VoyageProfitLossValues T1 
ORDER BY  T1.BeginDate DESC

FETCH NEXT FROM cursor1 
INTO @begindate;    

WHILE @@FETCH_STATUS = 0
BEGIN

  IF @base - @7S > @begindate
  BEGIN
    set @base = @begindate
  END
  insert into #datacluster ( dateCluster, dateV) 
  values (@base,  @begindate)

  FETCH NEXT FROM cursor1 
  INTO @begindate;    
END

Update VoyageProfitLossValues table from #dataCluster:

UPDATE hist.VoyageProfitLossValues 
SET BeginDate = (
   SELECT C.BeginDate 
   FROM #datacluster C 
   WHERE 
      C.dateV = hist.VoyageProfitLossValues.BeginDate 
  )

Note 1: Not tested!!

Optimized:

primary key on temporary table. fast forward read only cursor.

Upvotes: 0

Related Questions