guyts
guyts

Reputation: 949

Selecting rows in geopandas or pandas based on latitude/longitude and radius

I have a dataframe (pd) where each row contains a bunch of measures, as well as latitude and longitude values. I can convert those into geopandas points if needed.

From this dataframe, I would like to select only rows that fall within a certain (let's say 1km) radius from a new given lat/long.

Is there a wise way to go about this problem?

Here's a data sample from the df:

id .  lat  .  long  . polution . label
----------------------------------------
3  . 45.467. -79.51 .    7     . 'nice'
7  . 45.312. -79.56 .    8     . 'mediocre'

a sample lat/long would be lat = 45.4 and long = -79.5.

Upvotes: 5

Views: 6545

Answers (3)

pedrobin
pedrobin

Reputation: 181

On top of Sharder's solution, I found convenient to apply a filter function. It also seems to execute faster

def filter(row,lat2,lon2,max):
    if getDist(row['lat'],row['lon'],lat2,lon2) < max:
        return True
    else:
        return False

df[df.apply(filter, args = (newlat,newlon,600), axis=1)]

Upvotes: 1

jberrio
jberrio

Reputation: 1124

You can use the following algorithm:

  1. Create a geodataframe (gdfdata) from the input data (pd dataframe)

  2. Create another geodataframe (gdfsel) with the center point for the selection

  3. Create a buffer around the center point (make gdfselbuff from gdfsel) for the selection

  4. Use the within method of geopandas to find the points within. E.g. gdf_within = gdfdata.loc[gdfdata.geometry.within(gdfselbuff.unary_union)]

For making the buffer, you can use GeoSeries.buffer(distance, resolution)). See these links for reference.

http://geopandas.org/geometric_manipulations.html

https://gis.stackexchange.com/questions/253224/geopandas-buffer-using-geodataframe-while-maintaining-the-dataframe

Upvotes: 3

sharder
sharder

Reputation: 141

Here's an example of working code. First make a function to calculate your distance. I implemented a simple distance calculation, but I would recommending which ever you feel most useful. Next you can subset the DataFrame to be within your desired distance.

#Initialize DataFrame
df=pd.DataFrame(columns=['location','lat','lon'])
df['location']=['LA','NY','LV']
df['lat']=[34.05,40.71,36.16]
df['lon']=[-118.24,-74.00,-115.14]

#New point Reno 39.53,-119.81
newlat=39.53
newlon=-119.81

#Import trig stuff from math
from math import sin, cos, sqrt, atan2,radians

#Distance function between two lat/lon
def getDist(lat1,lon1,lat2,lon2):
  R = 6373.0

  lat1 = radians(lat1)
  lon1 = radians(lon1)
  lat2 = radians(lat2)
  lon2 = radians(lon2)

  dlon = lon2 - lon1
  dlat = lat2 - lat1

  a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
  c = 2 * atan2(sqrt(a), sqrt(1 - a))

  return R * c

#Apply distance function to dataframe
df['dist']=list(map(lambda k: getDist(df.loc[k]['lat'],df.loc[k]['lon'],newlat,newlon), df.index))

#This will give all locations within radius of 600 km
df[df['dist']<600]

Upvotes: 6

Related Questions