CatLady
CatLady

Reputation: 33

Given specific lat/lon calculate closest point from csv list of lat/lon

Need help with efficient python code(using pandas) to find which vehicle at what time passed closest to incident_sw =(35.7158, -120.7640). I'm having trouble formulating a Euclidean distance to sort through below df and print which vehicle and its corresponding time are closest to incident_sw. All times are HH:MM:SS.SS (assume below times are hour 12).

My time conversion function--

def time_convert(str_time):                                                   
values = str_time.split(':')                                                         
mins = 60*(float(values[0]) - 12) + float(values[1]) + 1.0/60 * float(values[2])     
mins = round(mins, 4)                                                                
return mins    

My csv dataframe--

vehicle time    lat[D.DDD]  lon[D.DDD]
veh_1   17:19.5 35.7167809  -120.7645652
veh_1   17:19.5 35.7167808  -120.7645652
veh_1   17:19.7 35.7167811  -120.7645648
veh_1   17:20.1 35.7167812  -120.7645652
veh_2   17:20.4 35.7167813  -120.7645647
veh_2   17:20.7 35.7167813  -120.7645646
veh_3   17:22.6 35.7167807  -120.7645651
veh_3   17:23.4 35.7167808  -120.7645652
veh_4   17:24.1 35.7167803  -120.7645653
veh_4   17:25.0 35.7167806  -120.7645658
veh_5   17:25.9 35.7167798  -120.7645659
veh_5   17:26.6 35.7167799  -120.7645658

Upvotes: 0

Views: 1782

Answers (1)

Kartik
Kartik

Reputation: 8703

So, at the outset, I would recommend you use a library like Geopy to do the heavy lifting of calculating the distances between points. Secondly, I would recommend using GeoPandas to store geographic information. More on that later.

Assuming your distances function is called distance (you code it yourself, or get it from Geopy, as you prefer), this will help speed up things for you somewhat. Note that the below implementation is still a loop, even though it uses vectorize from numpy library. Also, the below is pseudo-code, and you will have to modify it to work for you.

import numpy as np

def dist_calc(point, list_of_points):
    dist = np.vectorize(lambda x: distance(point, x))
    return dist(list_of_points)

# Now you can call it simply using:
df['points'] = list(zip(df['lat'], df['lon']))
df.groupby('vehicle')['points'].transform(dist_calc, point=incident_sw)

Reasons for recommending GeoPandas is simple. If you have a huge number of points to search from, say each vehicle leaves a trail of points every minute or second, then the above answer will take a long time to compute. If you are storing your data in a GeoPandas, then you can use the buffer and intersects tools in GeoPandas to limit the search space around your incidents. In that case, you will build a reasonable size buffer around your incidents, and only search for those vehicle points that fall inside that buffer. That will help speed up your code.

I would recommend you take a day to familiarize yourself with all the capabilities of GeoPandas before proceeding.


Using great_circle from geopy

from geopy import great_circle
import numpy as np

def dist_calc(point, list_of_points):
    dist = np.vectorize(lambda x: great_circle(point, x).meters)
    return dist(list_of_points)

# Now you can call it simply using:
df['points'] = list(zip(df['lat'], df['lon']))
df['distances'] = df.groupby('vehicle')['points'].transform(dist_calc, point=incident_sw)

Upvotes: 0

Related Questions