fondue222
fondue222

Reputation: 25

SQL: Filter data by datetime

I have a massive table I'm trying to thin out in MS SQL Server Management Studio. Currently there is a row for every 5 seconds of data. I want to shrink the table and only save 1 row for every 30 minutes.

For example, instead of having thousands of rows at

    Timestamp            A   B   C
    2015-01-01 00:00:00  1   5   6
    2015-01-01 00:00:05  5   7   1
    2015-01-01 00:00:10  2   2   3
    2015-01-01 00:00:15  2   1   0
    ......
    2015-04-10 13:55:55  4   5   6
    2015-04-10 13:56:00  6   4   2

I want to thin it out so I only have

    Timestamp            A   B   C
    2015-01-01 00:00:00  1   5   6
    2015-01-01 00:30:00  2   5   7
    2015-01-01 01:00:00  1   7   6
    ......
    2015-04-10 13:30:00  4   5   6
    2015-04-10 14:00:00  6   4   2

Thank you!

Upvotes: 1

Views: 2516

Answers (5)

Blob Larbler
Blob Larbler

Reputation: 1

Here is an example that might help:

CREATE TABLE [dbo].[Position](
[PositionId] [int] IDENTITY(1,1)  PRIMARY KEY,
[Latitude] [float] NOT NULL,
[Longitude] [float] NOT NULL,
[Date] [datetime] NOT NULL,
[DeviceId] [int] NOT NULL
)

insert into Position Values
(51.46,-2.58,'2022-01-04 08:05:00',1),
(51.41,-2.53,'2022-01-04 08:25:00',1),
(51.39,-2.48,'2022-01-04 08:45:00',1),
(51.39,-2.45,'2022-01-04 08:50:00',1),
(51.38,-2.37,'2022-01-04 09:10:00',1),
(51.41,-2.52,'2022-01-04 09:30:00',1),
(51.44,-2.56,'2022-01-04 09:50:00',1),
(51.45,-2.57,'2022-01-04 10:20:00',1),
(51.46,-2.58,'2022-01-04 11:40:00',1)

Declare @IntervalHours int = 1
Declare @Start Datetime = '2022-01-04 07:00'
Declare @End Datetime = '2022-01-04 12:00'
Declare @DeviceId int = 1

SELECT PositionId, [Date], RoundedDate, Latitude, Longitude, DeviceId
FROM (SELECT 
        PositionId, 
        Latitude, 
        Longitude,
        [Date],
        DATEADD(hour, DATEDIFF(hour, 0, [Date]), 0) AS RoundedDate, 
        DeviceId,
        ROW_NUMBER() OVER (PARTITION BY (DATEDIFF(hour, 0, [Date])/@IntervalHours) ORDER BY [Date] ASC) AS row_index
    From Position
    WHERE Date BETWEEN @Start AND @End and DeviceId = @DeviceId
    GROUP BY [Date], PositionId, Latitude,Longitude,DeviceId
    ) as grouped
where grouped.row_index = 1

Upvotes: 0

TommCatt
TommCatt

Reputation: 5636

When selecting one row from a group of rows, it is not always a good idea to write code that relies on assumptions that may not be true. For example, when looking for the first entry on the hour and on the half-hour, is it guaranteed that there will always be an entry with the time stamp "hh-00-00" and "hh-30-00"? Your sample data shows that but does that accurately represent your actual data in that way?

What you may want to say is that you want to keep the "first entry after each hour and half-hour boundary." That way, you know you will have one entry for each half-hour interval, even if the first entry came in at, say, two seconds after the hour or half-hour.

So the first step is to separate the date into half-hour groups. The best way I've found to do that is

Convert( SmallDatetime, Floor( Cast( Timestamp AS float ) * 48.0 ) / 48.0 )

Tack that on the end of a query list and see if the correct time (rounded down to exactly on the hour or exactly on the half-hour). Pay particular attention to times that are very close to (before and after) the boundary.

Using that, it's easy to generate a list of the times that are the first to occur within each half-hour increment. You don't have to worry that each increment has an entry that matches the half-hour boundary to the second. Join on that and you get a complete list of rows to keep. Delete all the rest.

with
Halfs( HalfHour )as(
    select  Min( Timestamp )
    from    TData
    group by Convert( SmallDatetime, Floor( Cast( Timestamp AS float ) * 48.0 ) / 48.0 )
)
select  t.*
from    TData t
join    Halfs h
    on  h.HalfHour = t.TimeStamp;

http://sqlfiddle.com/#!6/65a5c/1/0

Upvotes: 0

George Mastros
George Mastros

Reputation: 24498

If your times don't always and in 0 or 5, you could try this instead.

Delete
From    YourTable
Where   DateDiff(Second, '20000101', TimeStamp) % 1800 > 4

This works by calculating the number of seconds that have expired since Jan 1, 2000. The mod operator is used to calculate the intervals you want to keep. There are 1800 seconds in 30 minutes. Therefore, anything with seconds greater than 4 should be removed. Basically, instead of thinking about time in terms of hours, minutes and seconds, we are thinking about the time in terms of 1800 second intervals, which is what you want.

Upvotes: 0

benjamin moskovits
benjamin moskovits

Reputation: 5458

delete from [massive Table] where datepart(mi,timestamp) not in (0,30)

will delete all rows that don't match your criteria.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269563

Given your sample data, you could do:

select t.*
from table t
where datepart(minute, timestamp) in (0, 30) and
      datepart(second, timestamp) = 0;

Upvotes: 4

Related Questions