Reputation: 1
I have a DataFrame with latitude and longitude of places (restaurants
) and a DataFrame with latitude and longitude of neighborhoods (area
).
I would like, for each neighborhood, to count the number of restaurants in a 3km area (numberR
).
I have written the following code, and it works:
df=pd.DataFrame()
numberR=[]
radius=3
for element in range(0,area['lon'].count()): #for every neighborhood
df=pd.DataFrame()
df['destLat']=restaurants['lat']
df['originLat']=areas['lat'][element]
df['destLon']= restaurants['lng']
df['originLon']=area['lon'][element]
for i, row in df.iterrows():
#for every restaurant I compute the distance from my neighborhood in km
l=[haversine(df.originLon[i],df.originLat[i],df.destLon[i],df.destLat[i]) for i, row in df.iterrows()]
numberR.append(sum(x<radius for x in l))
However, I would like to make the code quicker as it is very slow.
Do you have any idea on how could I do to reach the same result in less time?
Thanks in advance.
P.S. haversine
is the well known function for getting distance in km starting from lat and lng.
Upvotes: 0
Views: 38
Reputation: 26
I would recommend you to use functions from scipy.spacial.distance.
from scipy.spatial.distance import cdist
distances = cdist(areas, restaurants, metric=haversine) # metric accepts a callable
sum(distances > 3) # sums columns
The cdist
function computes distances between each pair of rows of the two DataFrames.
Also, you should modify the haversine function as to be able to accept DataFrame rows.
Upvotes: 1