Reputation: 1031
I have SQL Server Table named table1 which has one timestamp column column_ts and some more columns say column1, column2, column3
So table looks like:
column_ts column1 column2 column3
2016-09-30 00:04:00.000 number1 string1 integer1
2016-09-30 00:24:00.000 number2 string2 integer2
2016-09-30 00:29:00.000 number3 string3 integer3
2016-09-30 00:44:00.000 number4 string4 integer4
2016-09-30 00:48:00.000 number5 string5 integer5
2016-09-30 01:04:00.000 number6 string6 integer6
2016-09-30 01:24:00.000 number7 string7 integer7
2016-09-30 01:54:00.000 number8 string8 integer8
2016-09-30 01:59:00.000 number9 string9 integer9
First, I will select records where column_ts >= 2016-09-30 00:00:00.000
. Then out of these, I want to select only one row that has highest time stamp from each 30 minute window of column_ts.
So for the given data, the query should select only following rows:
column_ts column1 column2 column3
2016-09-18 00:29:00.000 number3 string3 integer3
2016-09-18 00:48:00.000 number5 string5 integer5
2016-09-18 01:24:00.000 number7 string7 integer7
2016-09-18 01:59:00.000 number9 string9 integer9
In a way, I want to make 30 minute windows of column_ts like
1) 2016-09-30 00:00:00.000 - 2016-09-30 00:30:00.000
2) 2016-09-30 00:30:00.000 - 2016-09-30 01:00:00.000
3) 2016-09-30 01:00:00.000 - 2016-09-30 01:30:00.000
4) 2016-09-30 01:30:00.000 - 2016-09-30 02:00:00.000
And finally want to select one row which has highest value for column_ts from each of these 30 minute windows.
I am not able to figure out how I can generate the 30 minute window from which I can select MAX(column_ts)
. Please suggest how I can do this.
Upvotes: 2
Views: 1439
Reputation: 17915
@petelids's answer looks right to me but I'll offer an alternative that doesn't use literal dates in the calculations. I suppose you might even think it reads a little clearer. Based on your sample data I'm assuming you aren't storing seconds. It's also possible you could just ignore the seconds in the output with some formatting options. Seconds are irrelevant for the group by
either way.
select
dateadd
minute,
-datepart(minute, min(column_ts)) % 30,
min(column_ts)
) as timegroup,
max(column_ts) as max_time_in_window
from T
group by
cast(column_ts as date),
datepart(hour, column_ts),
datepart(minute, column_ts) / 30;
EDIT
Upon re-reading your question I realized that you want the entire row as your result. You can still use this approach although the row_number()
technique is probably more common these days and likely to be very fast.
select * from T
where column_ts in (
select max(column_ts) as max_time_in_window
from T
group by
cast(column_ts as date),
datepart(hour, column_ts),
datepart(minute, column_ts) / 30
);
or using row_number()
:
with data as (
select *,
row_number() over (
partition by
cast(column_ts as date),
datepart(hour, column_ts),
datepart(minute, column_ts) / 30
order by
column_ts
) as rn
)
select *
from data
where rn = 1;
Upvotes: 1
Reputation: 14341
;WITH cte AS (
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY
CASE
WHEN DATEPART(MINUTE,column_ts) > 30 THEN DATEADD(MINUTE,30 - DATEPART(MINUTE,column_ts),column_ts)
ELSE DATEADD(MINUTE,- DATEPART(MINUTE,column_ts),column_ts)
END
ORDER BY column_ts DESC) as RowNumber
FROM
@Table1
)
SELECT *
FROM
cte
WHERE
RowNumber = 1
You can as others are showing generate a table of every 30 minutes but the reality is you only need to round down to hour mark if less than 30 minutes or round to 30 minutes if above. That will create the grouping. so no need for recursive cte.
CASE
WHEN DATEPART(MINUTE,column_ts) => 30 THEN DATEADD(MINUTE,30 - DATEPART(MINUTE,column_ts),column_ts)
ELSE DATEADD(MINUTE,- DATEPART(MINUTE,column_ts),column_ts)
END as HalfHourGroup
Upvotes: 1
Reputation: 826
I did this by generating the table of "intervals" separately as a CTE. If you are doing this a lot, you may want to "persist" the intervals in a table so you can join to them. You should also give some thought to what you want to happen when there are two events with the same timestamp...
DECLARE @theDayInQuestion datetime = '2016-09-30';
WITH ints
AS (SELECT
0 AS n
UNION ALL
SELECT
n + 30
FROM ints
WHERE n + 30 < 1440),
LastTimestampInEachInterval
AS (SELECT
DATEADD(MINUTE, n, @theDayInQuestion) AS StartInterval,
DATEADD(MINUTE, n + 30, @theDayInQuestion) AS EndInterval,
MAX(t.column_ts) AS LastTimeStamp
FROM ints
LEFT JOIN t
ON t.column_ts BETWEEN
DATEADD(MINUTE, n, @theDayInQuestion) --StartInterval
AND
DATEADD(MINUTE, n + 30, @theDayInQuestion) --EndInterval
GROUP BY DATEADD(MINUTE, n, @theDayInQuestion),
DATEADD(MINUTE, n + 30, @theDayInQuestion))
SELECT
*
FROM LastTimestampInEachInterval
LEFT JOIN t
ON LastTimeStampInEachInterval.LastTimeStamp = t.column_ts
(Warning: Script may not work tomorrow...)
Upvotes: 0
Reputation: 12815
You can take the date difference in minutes from an epoch and then divide that by 30 to group by 30 minute intervals.
This query will give each 30 minute slot along with the max column_ts for that slot:
select dateadd(minute, datediff(minute, '1970-1-1',column_ts)/30*30,'1970-1-1') as timegroup,
MAX(column_ts) as max_time
from table1 where column_ts >= '2016-09-30 00:00:00.000'
group by datediff(minute, '1970-1-1', column_ts) / 30
The above produces:
timegroup max_time
2016-09-30 00:00:00.000 2016-09-30 00:29:00.000
2016-09-30 00:30:00.000 2016-09-30 00:48:00.000
2016-09-30 01:00:00.000 2016-09-30 01:24:00.000
2016-09-30 01:30:00.000 2016-09-30 01:59:00.000
Once you have that, you can use it in a sub-query to get the results you are after:
select groups.timegroup, t.column_ts, t.column1, t.column2, t.column3
from (
select dateadd(minute, datediff(minute, '1970-1-1',column_ts)/30*30,'1970-1-1') as timegroup,MAX(column_ts) as max_time
from table1 where column_ts >= '2016-09-30 00:00:00.000'
group by datediff(minute, '1970-1-1', column_ts) / 30
) as groups
inner join table1 t on t.column_ts = groups.max_time
Which produces
timegroup column_ts column1 column2 column3
2016-09-30 00:00:00.000 2016-09-30 00:29:00.000 number3 string3 integer3
2016-09-30 00:30:00.000 2016-09-30 00:48:00.000 number5 string5 integer5
2016-09-30 01:00:00.000 2016-09-30 01:24:00.000 number7 string7 integer7
2016-09-30 01:30:00.000 2016-09-30 01:59:00.000 number9 string9 integer9
Upvotes: 4
Reputation: 7392
I would generate an interval table, and join that to your data. Then add a row_number()
for each interval ordered by the column_ts
in a descending manner, returning only the highest values (RN=1).
DECLARE @Test TABLE (column_ts datetime, column1 varchar(50), column2 varchar(50), column3 varchar(50))
INSERT INTO @Test
VALUES ('2016-09-30 00:04:00.000','number1','string1','integer1'),
('2016-09-30 00:24:00.000','number2','string2','integer2'),
('2016-09-30 00:29:00.000','number3','string3','integer3'),
('2016-09-30 00:44:00.000','number4','string4','integer4'),
('2016-09-30 00:48:00.000','number5','string5','integer5'),
('2016-09-30 01:04:00.000','number6','string6','integer6'),
('2016-09-30 01:24:00.000','number7','string7','integer7'),
('2016-09-30 01:54:00.000','number8','string8','integer8'),
('2016-09-30 01:59:00.000','number9','string9','integer9')
DECLARE @TimeGrid TABLE (IntervalStart TIME, IntervalEnd TIME)
DECLARE @MyTime TIME, @true BIT=1
WHILE @true=1
BEGIN
IF @MyTime IS NULL SET @MyTime = CONVERT(TIME,'00:00:00')
INSERT INTO @TimeGrid (IntervalStart,IntervalEnd)
SELECT @MyTime, DATEADD(NS,-100,DATEADD(MI,30,@MyTime))
SET @MyTime=DATEADD(MI,30,@MyTime)
IF @MyTime= CONVERT(TIME,'00:00:00')
SET @true=0
END
;WITH X AS
(
SELECT *
FROM @Test T
JOIN @TimeGrid TG ON CONVERT(TIME,T.column_ts) BETWEEN TG.IntervalStart AND TG.IntervalEnd
), Y AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY IntervalStart ORDER BY column_ts DESC) AS RN
FROM X
)
SELECT column_ts, column1, column2, column3--, IntervalStart, IntervalEnd, RN
FROM Y
WHERE RN=1
Upvotes: 1
Reputation: 1630
Assuming you are using sql server 2005+, here is the script
use tempdb
--drop table dbo.t
create table dbo.t (column_ts datetime, column1 varchar(30), column2 varchar(30), column3 varchar(30));
go
-- populate the table
insert into dbo.t (column_ts, column1, column2, column3)
select '2016-09-30 00:04:00.000','number1','string1','integer1'
union all select '2016-09-30 00:24:00.000','number2','string2','integer2'
union all select '2016-09-30 00:29:00.000','number3','string3','integer3'
union all select '2016-09-30 00:44:00.000','number4','string4','integer4'
union all select '2016-09-30 00:48:00.000','number5','string5','integer5'
union all select '2016-09-30 01:04:00.000','number6','string6','integer6'
union all select '2016-09-30 01:24:00.000','number7','string7','integer7'
union all select '2016-09-30 01:54:00.000','number8','string8','integer8'
union all select '2016-09-30 01:59:00.000','number9','string9','integer9';
go
-- the query
; with c as (
select section=datediff(minute, '2016-09-30', column_ts)/30, * from dbo.t
)
, c2 as (select rnk=rank() over (partition by section order by column_ts desc), * from c)
select column_ts, column1, column2, column3
from c2
where rnk = 1;
I did the similar thing before when I need to find out the most expensive query for each 30 min window after I collected the performance trace.
Upvotes: 2
Reputation: 424983
Can be done without windowing functions :
select max(column_ts) column_ts, column1, column2, column3
from mytable
where column_ts >= 2016-09-30 00:00:00.000
group by column1, column2, column3
To get results across multiple time brackets, group by the bracket too:
select max(column_ts) column_ts, column1, column2, column3
from mytable
group by column1, column2, column3, <expression to calculate a unique value for each column_ts bracket>
Upvotes: 0