Reputation: 25
I have a massive table I'm trying to thin out in MS SQL Server Management Studio. Currently there is a row for every 5 seconds of data. I want to shrink the table and only save 1 row for every 30 minutes.
For example, instead of having thousands of rows at
Timestamp A B C
2015-01-01 00:00:00 1 5 6
2015-01-01 00:00:05 5 7 1
2015-01-01 00:00:10 2 2 3
2015-01-01 00:00:15 2 1 0
......
2015-04-10 13:55:55 4 5 6
2015-04-10 13:56:00 6 4 2
I want to thin it out so I only have
Timestamp A B C
2015-01-01 00:00:00 1 5 6
2015-01-01 00:30:00 2 5 7
2015-01-01 01:00:00 1 7 6
......
2015-04-10 13:30:00 4 5 6
2015-04-10 14:00:00 6 4 2
Thank you!
Upvotes: 1
Views: 2516
Reputation: 1
Here is an example that might help:
CREATE TABLE [dbo].[Position](
[PositionId] [int] IDENTITY(1,1) PRIMARY KEY,
[Latitude] [float] NOT NULL,
[Longitude] [float] NOT NULL,
[Date] [datetime] NOT NULL,
[DeviceId] [int] NOT NULL
)
insert into Position Values
(51.46,-2.58,'2022-01-04 08:05:00',1),
(51.41,-2.53,'2022-01-04 08:25:00',1),
(51.39,-2.48,'2022-01-04 08:45:00',1),
(51.39,-2.45,'2022-01-04 08:50:00',1),
(51.38,-2.37,'2022-01-04 09:10:00',1),
(51.41,-2.52,'2022-01-04 09:30:00',1),
(51.44,-2.56,'2022-01-04 09:50:00',1),
(51.45,-2.57,'2022-01-04 10:20:00',1),
(51.46,-2.58,'2022-01-04 11:40:00',1)
Declare @IntervalHours int = 1
Declare @Start Datetime = '2022-01-04 07:00'
Declare @End Datetime = '2022-01-04 12:00'
Declare @DeviceId int = 1
SELECT PositionId, [Date], RoundedDate, Latitude, Longitude, DeviceId
FROM (SELECT
PositionId,
Latitude,
Longitude,
[Date],
DATEADD(hour, DATEDIFF(hour, 0, [Date]), 0) AS RoundedDate,
DeviceId,
ROW_NUMBER() OVER (PARTITION BY (DATEDIFF(hour, 0, [Date])/@IntervalHours) ORDER BY [Date] ASC) AS row_index
From Position
WHERE Date BETWEEN @Start AND @End and DeviceId = @DeviceId
GROUP BY [Date], PositionId, Latitude,Longitude,DeviceId
) as grouped
where grouped.row_index = 1
Upvotes: 0
Reputation: 5636
When selecting one row from a group of rows, it is not always a good idea to write code that relies on assumptions that may not be true. For example, when looking for the first entry on the hour and on the half-hour, is it guaranteed that there will always be an entry with the time stamp "hh-00-00" and "hh-30-00"? Your sample data shows that but does that accurately represent your actual data in that way?
What you may want to say is that you want to keep the "first entry after each hour and half-hour boundary." That way, you know you will have one entry for each half-hour interval, even if the first entry came in at, say, two seconds after the hour or half-hour.
So the first step is to separate the date into half-hour groups. The best way I've found to do that is
Convert( SmallDatetime, Floor( Cast( Timestamp AS float ) * 48.0 ) / 48.0 )
Tack that on the end of a query list and see if the correct time (rounded down to exactly on the hour or exactly on the half-hour). Pay particular attention to times that are very close to (before and after) the boundary.
Using that, it's easy to generate a list of the times that are the first to occur within each half-hour increment. You don't have to worry that each increment has an entry that matches the half-hour boundary to the second. Join on that and you get a complete list of rows to keep. Delete all the rest.
with
Halfs( HalfHour )as(
select Min( Timestamp )
from TData
group by Convert( SmallDatetime, Floor( Cast( Timestamp AS float ) * 48.0 ) / 48.0 )
)
select t.*
from TData t
join Halfs h
on h.HalfHour = t.TimeStamp;
http://sqlfiddle.com/#!6/65a5c/1/0
Upvotes: 0
Reputation: 24498
If your times don't always and in 0 or 5, you could try this instead.
Delete
From YourTable
Where DateDiff(Second, '20000101', TimeStamp) % 1800 > 4
This works by calculating the number of seconds that have expired since Jan 1, 2000. The mod operator is used to calculate the intervals you want to keep. There are 1800 seconds in 30 minutes. Therefore, anything with seconds greater than 4 should be removed. Basically, instead of thinking about time in terms of hours, minutes and seconds, we are thinking about the time in terms of 1800 second intervals, which is what you want.
Upvotes: 0
Reputation: 5458
delete from [massive Table] where datepart(mi,timestamp) not in (0,30)
will delete all rows that don't match your criteria.
Upvotes: 0
Reputation: 1269563
Given your sample data, you could do:
select t.*
from table t
where datepart(minute, timestamp) in (0, 30) and
datepart(second, timestamp) = 0;
Upvotes: 4