vaffanb
vaffanb

Reputation: 1

Get places in a given geographical area (code optimization)

I have a DataFrame with latitude and longitude of places (restaurants) and a DataFrame with latitude and longitude of neighborhoods (area).

I would like, for each neighborhood, to count the number of restaurants in a 3km area (numberR).

I have written the following code, and it works:

df=pd.DataFrame()
numberR=[]
radius=3

for element in range(0,area['lon'].count()): #for every neighborhood  
    df=pd.DataFrame()
    df['destLat']=restaurants['lat']
    df['originLat']=areas['lat'][element]
    df['destLon']= restaurants['lng']
    df['originLon']=area['lon'][element]

    for i, row in df.iterrows():
        #for every restaurant I compute the distance from my neighborhood in km
        l=[haversine(df.originLon[i],df.originLat[i],df.destLon[i],df.destLat[i]) for i, row in df.iterrows()]

    numberR.append(sum(x<radius for x in l))

However, I would like to make the code quicker as it is very slow.

Do you have any idea on how could I do to reach the same result in less time?

Thanks in advance.

P.S. haversine is the well known function for getting distance in km starting from lat and lng.

Upvotes: 0

Views: 38

Answers (1)

I would recommend you to use functions from scipy.spacial.distance.

from scipy.spatial.distance import cdist

distances = cdist(areas, restaurants, metric=haversine)  # metric accepts a callable
sum(distances > 3)  # sums columns

The cdist function computes distances between each pair of rows of the two DataFrames.

Also, you should modify the haversine function as to be able to accept DataFrame rows.

Upvotes: 1

Related Questions