Reputation: 85
I'm trying to get a dataset put together to feed a GIS application and looking for a SQL Guru to help me avoid using a cursor and temp tables since we have about 30 million records to process (I'm using SQL Server 2012). The main points of the dataset consist of a VehicleID, PositionTime, Latitude, Longitude, and Speed. We need to have the records ordered by VehicleID and PositionTime in order to map the Lat/Long points and track movement. If the Speed is greater than or equal to 0.2, then it's an acceptable record. However, if the speed is less than 0.2, then it's considered as not moving and needs special processing. In the example below, for records one and two, I would need to average the lat, long, and speed, and also keep the maximum position time and put that into a single record. The third and fourth records are fine as is. Records five and six would also need to be combined into one record, so vehicle 1 would go from 6 records to 4 records. After the records are "normalized", we then need to get the date difference between records for each vehicle and position time. The same process would start over again for VehicleID 2 (we have about 7,000 vehicles).
VehicleID PositionTime Latitude Longitude Speed 1 11/20/2013 18:09:27 29.54608 -95.04444 0.1 1 11/20/2013 18:47:35 29.54608 -95.04444 0 1 11/20/2013 20:34:45 29.546105 -95.04442 5 1 11/20/2013 20:46:44 29.54607833 -95.04443167 3 1 11/20/2013 21:01:44 29.54606667 -95.04442833 0 1 11/20/2013 21:16:43 29.546095 -95.04443167 0.1 2 11/20/2013 21:31:44 29.54609 -95.04441 5 2 11/20/2013 21:46:44 29.54607667 -95.04443 0
Upvotes: 0
Views: 94
Reputation: 35790
Ok, here is some code:
DECLARE @t TABLE
(
VehicleID INT ,
PositionTime DATETIME ,
Latitude DECIMAL(20, 10) ,
Longitude DECIMAL(20, 10) ,
Speed DECIMAL(20, 10)
)
INSERT INTO @t
VALUES ( 1, '11/20/2013 18:09:27', 29.54608, -95.04444, 0.1 ),
( 1, '11/20/2013 18:47:35', 29.54608, -95.04444, 0 ),
( 1, '11/20/2013 20:34:45', 29.546105, -95.04442, 5 ),
( 1, '11/20/2013 20:46:44', 29.54607833, -95.04443167, 3 ),
( 1, '11/20/2013 21:01:44', 29.54606667, -95.04442833, 0 ),
( 1, '11/20/2013 21:16:43', 29.546095, -95.04443167, 0.1 ),
( 2, '11/20/2013 21:31:44', 29.54609, -95.04441, 5 ),
( 2, '11/20/2013 21:46:44', 29.54607667, -95.04443, 0 );
WITH cte1
AS ( SELECT VehicleID ,
PositionTime ,
Latitude ,
Longitude ,
CASE WHEN Speed <= 0.2 THEN 0
ELSE Speed
END AS Speed
FROM @t
),
cte2
AS ( SELECT * ,
SUM(Speed) OVER ( PARTITION BY VehicleID ORDER BY PositionTime ) AS s
FROM cte1
),
cte3
AS ( SELECT * ,
RANK() OVER ( PARTITION BY VehicleID ORDER BY Speed, s ) AS r
FROM cte2
),
cte4
AS ( SELECT VehicleID ,
MAX(PositionTime) AS PositionTime ,
AVG(Latitude) AS Latitude ,
AVG(Longitude) AS Longitude ,
MAX(Speed) AS Speed
FROM cte3
GROUP BY VehicleID ,
r
)
SELECT * ,
DATEDIFF(ss,
LAG(PositionTime) OVER ( PARTITION BY VehicleID ORDER BY PositionTime ),
PositionTime) AS DiffInSeconds
FROM cte4
ORDER BY PositionTime
And output:
VehicleID PositionTime Latitude Longitude Speed DiffInSeconds
1 2013-11-20 18:47:35.000 29.5460800000 -95.0444400000 0.0000000000 NULL
1 2013-11-20 20:34:45.000 29.5461050000 -95.0444200000 5.0000000000 6430
1 2013-11-20 20:46:44.000 29.5460783300 -95.0444316700 3.0000000000 719
1 2013-11-20 21:16:43.000 29.5460808350 -95.0444300000 0.0000000000 1799
2 2013-11-20 21:31:44.000 29.5460900000 -95.0444100000 5.0000000000 NULL
2 2013-11-20 21:46:44.000 29.5460766700 -95.0444300000 0.0000000000 900
Let's go through it.
In cte1
I just cast all values of Speed <=0.2
to 0
In cte2
I summ Speed
values, so I get same values where Speed = 0
Speed s
0.0000000000 0.0000000000
0.0000000000 0.0000000000
5.0000000000 5.0000000000
3.0000000000 8.0000000000
0.0000000000 8.0000000000
0.0000000000 8.0000000000
5.0000000000 5.0000000000
0.0000000000 5.0000000000
Then in cte3
I am applying ranking function to combination of Speed
and incremental sum s
. So I get:
VehicleID Speed s r
1 0.0000000000 0.0000000000 1
1 0.0000000000 0.0000000000 1
1 0.0000000000 8.0000000000 3
1 0.0000000000 8.0000000000 3
1 3.0000000000 8.0000000000 5
1 5.0000000000 5.0000000000 6
2 0.0000000000 5.0000000000 1
2 5.0000000000 5.0000000000 2
Notice that you have same r
where Speed
is 0, so you can apply grouping here by VehicleID
and r
. I do that in cte4
to get distinct rows and apply aggregation like Max
on date and Avg
on lat and long.
And finally there is select from cte4 that calculates difference between current row and the row before it using LAG
function.
Upvotes: 1
Reputation: 2694
I think all you are trying to do is aggregate the slow/non-moving vehicles
Does this help?
Create or replace view vehicles_view as
select VehicleId,
Max(PositionTime) as PositionTime,
avg(Latitude) as Latitude,
Avg(Longitude) as Longitude,
avg(Speed) as speed
from vehicles
where Speed < 0.2
group by vehicleId
union
select VehicleId,
PositionTime ,
Latitude,
Longitude,
Speed
from vehicles
where Speed >= 0.2
Upvotes: 1