300
300

Reputation: 1031

How to select rows with highest time stamp in 30 minute window?

I have SQL Server Table named table1 which has one timestamp column column_ts and some more columns say column1, column2, column3

So table looks like:

column_ts                   column1     column2     column3
2016-09-30 00:04:00.000     number1     string1     integer1
2016-09-30 00:24:00.000     number2     string2     integer2
2016-09-30 00:29:00.000     number3     string3     integer3
2016-09-30 00:44:00.000     number4     string4     integer4
2016-09-30 00:48:00.000     number5     string5     integer5
2016-09-30 01:04:00.000     number6     string6     integer6
2016-09-30 01:24:00.000     number7     string7     integer7
2016-09-30 01:54:00.000     number8     string8     integer8
2016-09-30 01:59:00.000     number9     string9     integer9

First, I will select records where column_ts >= 2016-09-30 00:00:00.000. Then out of these, I want to select only one row that has highest time stamp from each 30 minute window of column_ts.

So for the given data, the query should select only following rows:

column_ts                   column1     column2     column3
2016-09-18 00:29:00.000     number3     string3     integer3
2016-09-18 00:48:00.000     number5     string5     integer5
2016-09-18 01:24:00.000     number7     string7     integer7
2016-09-18 01:59:00.000     number9     string9     integer9

In a way, I want to make 30 minute windows of column_ts like

1) 2016-09-30 00:00:00.000 - 2016-09-30 00:30:00.000
2) 2016-09-30 00:30:00.000 - 2016-09-30 01:00:00.000
3) 2016-09-30 01:00:00.000 - 2016-09-30 01:30:00.000
4) 2016-09-30 01:30:00.000 - 2016-09-30 02:00:00.000

And finally want to select one row which has highest value for column_ts from each of these 30 minute windows.

I am not able to figure out how I can generate the 30 minute window from which I can select MAX(column_ts). Please suggest how I can do this.

Upvotes: 2

Views: 1439

Answers (7)

shawnt00
shawnt00

Reputation: 17915

@petelids's answer looks right to me but I'll offer an alternative that doesn't use literal dates in the calculations. I suppose you might even think it reads a little clearer. Based on your sample data I'm assuming you aren't storing seconds. It's also possible you could just ignore the seconds in the output with some formatting options. Seconds are irrelevant for the group by either way.

select
    dateadd
        minute,
        -datepart(minute, min(column_ts)) % 30,
        min(column_ts)
    ) as timegroup,
    max(column_ts) as max_time_in_window
from T
group by
    cast(column_ts as date),
    datepart(hour, column_ts),
    datepart(minute, column_ts) / 30;

EDIT Upon re-reading your question I realized that you want the entire row as your result. You can still use this approach although the row_number() technique is probably more common these days and likely to be very fast.

select * from T
where column_ts in (
    select max(column_ts) as max_time_in_window
    from T
    group by
        cast(column_ts as date),
        datepart(hour, column_ts),
        datepart(minute, column_ts) / 30
);

or using row_number():

with data as (
    select *,
        row_number() over (
            partition by
                cast(column_ts as date),
                datepart(hour, column_ts),
                datepart(minute, column_ts) / 30
            order by
                column_ts
        ) as rn
)
select *
from data
where rn = 1;

Upvotes: 1

Matt
Matt

Reputation: 14341

;WITH cte AS (
    SELECT
       *
       ,ROW_NUMBER() OVER (PARTITION BY
                CASE
                    WHEN DATEPART(MINUTE,column_ts) > 30 THEN DATEADD(MINUTE,30 - DATEPART(MINUTE,column_ts),column_ts)
                    ELSE DATEADD(MINUTE,- DATEPART(MINUTE,column_ts),column_ts)
                END
             ORDER BY column_ts DESC) as RowNumber
    FROM
       @Table1
)

SELECT *
FROM
    cte
WHERE
    RowNumber = 1

You can as others are showing generate a table of every 30 minutes but the reality is you only need to round down to hour mark if less than 30 minutes or round to 30 minutes if above. That will create the grouping. so no need for recursive cte.

CASE
     WHEN DATEPART(MINUTE,column_ts) => 30 THEN DATEADD(MINUTE,30 - DATEPART(MINUTE,column_ts),column_ts)
     ELSE DATEADD(MINUTE,- DATEPART(MINUTE,column_ts),column_ts)
END as HalfHourGroup

Upvotes: 1

Gavin Campbell
Gavin Campbell

Reputation: 826

I did this by generating the table of "intervals" separately as a CTE. If you are doing this a lot, you may want to "persist" the intervals in a table so you can join to them. You should also give some thought to what you want to happen when there are two events with the same timestamp...

DECLARE @theDayInQuestion datetime = '2016-09-30';
WITH ints
AS (SELECT
  0 AS n
UNION ALL
SELECT
  n + 30
FROM ints
WHERE n + 30 < 1440),

LastTimestampInEachInterval
AS (SELECT
  DATEADD(MINUTE, n, @theDayInQuestion) AS StartInterval,
  DATEADD(MINUTE, n + 30, @theDayInQuestion) AS EndInterval,
  MAX(t.column_ts) AS LastTimeStamp


FROM ints
LEFT JOIN t
  ON t.column_ts BETWEEN
  DATEADD(MINUTE, n, @theDayInQuestion) --StartInterval
  AND
  DATEADD(MINUTE, n + 30, @theDayInQuestion) --EndInterval
GROUP BY DATEADD(MINUTE, n, @theDayInQuestion),
         DATEADD(MINUTE, n + 30, @theDayInQuestion))
SELECT
  *
FROM LastTimestampInEachInterval
LEFT JOIN t
  ON LastTimeStampInEachInterval.LastTimeStamp = t.column_ts

(Warning: Script may not work tomorrow...)

Upvotes: 0

petelids
petelids

Reputation: 12815

You can take the date difference in minutes from an epoch and then divide that by 30 to group by 30 minute intervals.

This query will give each 30 minute slot along with the max column_ts for that slot:

select dateadd(minute, datediff(minute, '1970-1-1',column_ts)/30*30,'1970-1-1') as timegroup,
       MAX(column_ts) as max_time
from table1 where column_ts >= '2016-09-30 00:00:00.000'
group by datediff(minute, '1970-1-1', column_ts) / 30

The above produces:

timegroup                   max_time
2016-09-30 00:00:00.000     2016-09-30 00:29:00.000
2016-09-30 00:30:00.000     2016-09-30 00:48:00.000
2016-09-30 01:00:00.000     2016-09-30 01:24:00.000
2016-09-30 01:30:00.000     2016-09-30 01:59:00.000

Once you have that, you can use it in a sub-query to get the results you are after:

select groups.timegroup, t.column_ts, t.column1, t.column2, t.column3 
from (
    select dateadd(minute, datediff(minute, '1970-1-1',column_ts)/30*30,'1970-1-1') as timegroup,MAX(column_ts) as max_time
    from table1 where column_ts >= '2016-09-30 00:00:00.000'
    group by datediff(minute, '1970-1-1', column_ts) / 30
) as groups
inner join table1 t on t.column_ts = groups.max_time

Which produces

timegroup                   column_ts                   column1   column2   column3
2016-09-30 00:00:00.000     2016-09-30 00:29:00.000     number3   string3   integer3
2016-09-30 00:30:00.000     2016-09-30 00:48:00.000     number5   string5   integer5
2016-09-30 01:00:00.000     2016-09-30 01:24:00.000     number7   string7   integer7
2016-09-30 01:30:00.000     2016-09-30 01:59:00.000     number9   string9   integer9

Upvotes: 4

Dave C
Dave C

Reputation: 7392

I would generate an interval table, and join that to your data. Then add a row_number() for each interval ordered by the column_ts in a descending manner, returning only the highest values (RN=1).

DECLARE @Test TABLE (column_ts datetime, column1 varchar(50), column2 varchar(50), column3 varchar(50))
INSERT INTO @Test
VALUES ('2016-09-30 00:04:00.000','number1','string1','integer1'),
       ('2016-09-30 00:24:00.000','number2','string2','integer2'),
       ('2016-09-30 00:29:00.000','number3','string3','integer3'),
       ('2016-09-30 00:44:00.000','number4','string4','integer4'),
       ('2016-09-30 00:48:00.000','number5','string5','integer5'),
       ('2016-09-30 01:04:00.000','number6','string6','integer6'),
       ('2016-09-30 01:24:00.000','number7','string7','integer7'),
       ('2016-09-30 01:54:00.000','number8','string8','integer8'),
       ('2016-09-30 01:59:00.000','number9','string9','integer9')

DECLARE @TimeGrid TABLE (IntervalStart TIME, IntervalEnd TIME)

DECLARE @MyTime TIME, @true BIT=1

WHILE @true=1
BEGIN
    IF @MyTime IS NULL SET @MyTime = CONVERT(TIME,'00:00:00')

    INSERT INTO @TimeGrid (IntervalStart,IntervalEnd)
    SELECT @MyTime, DATEADD(NS,-100,DATEADD(MI,30,@MyTime))

    SET @MyTime=DATEADD(MI,30,@MyTime)
    IF @MyTime= CONVERT(TIME,'00:00:00')
        SET @true=0
END

;WITH X AS
(
    SELECT * 
    FROM @Test T
    JOIN @TimeGrid TG ON CONVERT(TIME,T.column_ts) BETWEEN TG.IntervalStart AND TG.IntervalEnd
), Y AS
    (
        SELECT *,
               ROW_NUMBER() OVER(PARTITION BY IntervalStart ORDER BY column_ts DESC) AS RN
        FROM X
    )

SELECT column_ts, column1, column2, column3--, IntervalStart, IntervalEnd, RN
FROM Y
WHERE RN=1

Upvotes: 1

jyao
jyao

Reputation: 1630

Assuming you are using sql server 2005+, here is the script

use tempdb
--drop table dbo.t
create table dbo.t (column_ts datetime, column1 varchar(30), column2 varchar(30), column3 varchar(30));
go
-- populate the table
insert into dbo.t (column_ts, column1, column2, column3)
select '2016-09-30 00:04:00.000','number1','string1','integer1'
union all select '2016-09-30 00:24:00.000','number2','string2','integer2'
union all select '2016-09-30 00:29:00.000','number3','string3','integer3'
union all select '2016-09-30 00:44:00.000','number4','string4','integer4'
union all select '2016-09-30 00:48:00.000','number5','string5','integer5'
union all select '2016-09-30 01:04:00.000','number6','string6','integer6'
union all select '2016-09-30 01:24:00.000','number7','string7','integer7'
union all select '2016-09-30 01:54:00.000','number8','string8','integer8'
union all select '2016-09-30 01:59:00.000','number9','string9','integer9';
go

-- the query
; with c as (
select section=datediff(minute, '2016-09-30', column_ts)/30, * from dbo.t
)
, c2 as (select rnk=rank() over (partition by section order by column_ts desc), * from c)
select column_ts, column1, column2, column3
from c2 
where rnk = 1;

I did the similar thing before when I need to find out the most expensive query for each 30 min window after I collected the performance trace.

Upvotes: 2

Bohemian
Bohemian

Reputation: 424983

Can be done without windowing functions :

select max(column_ts) column_ts, column1, column2, column3
from mytable
where column_ts >= 2016-09-30 00:00:00.000
group by column1, column2, column3

To get results across multiple time brackets, group by the bracket too:

select max(column_ts) column_ts, column1, column2, column3
from mytable
group by column1, column2, column3, <expression to calculate a unique value for each column_ts bracket>

Upvotes: 0

Related Questions