Reputation: 33
Need help with efficient python code(using pandas) to find which vehicle at what time passed closest to incident_sw =(35.7158, -120.7640). I'm having trouble formulating a Euclidean distance to sort through below df and print which vehicle and its corresponding time are closest to incident_sw. All times are HH:MM:SS.SS (assume below times are hour 12).
My time conversion function--
def time_convert(str_time):
values = str_time.split(':')
mins = 60*(float(values[0]) - 12) + float(values[1]) + 1.0/60 * float(values[2])
mins = round(mins, 4)
return mins
My csv dataframe--
vehicle time lat[D.DDD] lon[D.DDD]
veh_1 17:19.5 35.7167809 -120.7645652
veh_1 17:19.5 35.7167808 -120.7645652
veh_1 17:19.7 35.7167811 -120.7645648
veh_1 17:20.1 35.7167812 -120.7645652
veh_2 17:20.4 35.7167813 -120.7645647
veh_2 17:20.7 35.7167813 -120.7645646
veh_3 17:22.6 35.7167807 -120.7645651
veh_3 17:23.4 35.7167808 -120.7645652
veh_4 17:24.1 35.7167803 -120.7645653
veh_4 17:25.0 35.7167806 -120.7645658
veh_5 17:25.9 35.7167798 -120.7645659
veh_5 17:26.6 35.7167799 -120.7645658
Upvotes: 0
Views: 1782
Reputation: 8703
So, at the outset, I would recommend you use a library like Geopy to do the heavy lifting of calculating the distances between points. Secondly, I would recommend using GeoPandas to store geographic information. More on that later.
Assuming your distances function is called distance
(you code it yourself, or get it from Geopy, as you prefer), this will help speed up things for you somewhat. Note that the below implementation is still a loop, even though it uses vectorize
from numpy
library. Also, the below is pseudo-code, and you will have to modify it to work for you.
import numpy as np
def dist_calc(point, list_of_points):
dist = np.vectorize(lambda x: distance(point, x))
return dist(list_of_points)
# Now you can call it simply using:
df['points'] = list(zip(df['lat'], df['lon']))
df.groupby('vehicle')['points'].transform(dist_calc, point=incident_sw)
Reasons for recommending GeoPandas is simple. If you have a huge number of points to search from, say each vehicle leaves a trail of points every minute or second, then the above answer will take a long time to compute. If you are storing your data in a GeoPandas, then you can use the buffer
and intersects
tools in GeoPandas to limit the search space around your incidents. In that case, you will build a reasonable size buffer around your incidents, and only search for those vehicle points that fall inside that buffer. That will help speed up your code.
I would recommend you take a day to familiarize yourself with all the capabilities of GeoPandas before proceeding.
from geopy import great_circle
import numpy as np
def dist_calc(point, list_of_points):
dist = np.vectorize(lambda x: great_circle(point, x).meters)
return dist(list_of_points)
# Now you can call it simply using:
df['points'] = list(zip(df['lat'], df['lon']))
df['distances'] = df.groupby('vehicle')['points'].transform(dist_calc, point=incident_sw)
Upvotes: 0