José Luzardo
José Luzardo

Reputation: 61

Filter a GeoPandas dataframe within a polygon and remove from the dataframe the ones who are not there

I have a .csv file which contains some points (longitude, latitude). I converted it to a DataFrame and from DataFrame to a GeoDataFrame with this code:

CSV file:

Date;User ID;Longitude;Latitude

2020-01-02;824664;-79.8831613;-2.1811152000000003

2020-03-01;123456;80.8831613;2.1811

2020-01-15;147835;-80.78035200000001;-1.4845725

Code that I used to transform .csv to gdf:

df = pd.read_csv('datos25.csv', sep=';', low_memory=False, decimal='.')
gdf = geopandas.GeoDataFrame(
      df, geometry=geopandas.points_from_xy(df.Longitud, df.Latitud)) 

Then, I use this code to define my polygon which is a country:

world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
ec = world[world.name == 'Ecuador'] 

Now, what I want to do is that every POINT in gdf, verify if it is in the polygon/country, and in case that it is not, remove that row from the DataFrame

For example, in this case, the second value in geometry column which is:

POINT (80.8831613 2.1811)

The row where this value is should be remove from the dataframe because it's not in the polygon/country

How can I do this?

Upvotes: 6

Views: 12748

Answers (2)

swatchai
swatchai

Reputation: 18782

The spatial operation within is needed to identify whether a point geometry is located within a polygon geometry. In the code below, all the necessary steps are performed towards the goal of identifying all points that fall within a polygon (Ecuador). At the final step, a plot is created to visualize/check the result.

import pandas as pd
import geopandas
from shapely.geometry import Point  #Polygon

df = pd.read_csv('ecuador_data.csv', sep=';', low_memory=False, decimal='.')
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
ecuador = world[world.name == 'Ecuador'] 

# add new column to df
df['withinQ'] = ""

withinQlist = []
for lon,lat in zip(df.Longitude, df.Latitude):
    pt = Point(lon, lat)
    withinQ = pt.within(ecuador['geometry'].values[0])
    #print( withinQ )
    withinQlist.append(withinQ)

# update values in the that column, values: True/False
df['withinQ'] = withinQlist

# uncomment next line to see content of `df`
#print(df)

#          Date  User_ID  Longitude  Latitude  withinQ
# 0  2020-01-02   824664   -79.8832   -2.1811     True
# 1  2020-03-01   123456    80.8832    2.1811    False
# 2  2020-01-15   147835   -80.7804   -1.4845     True

# select points within ecuador, assign to `result_df` dataframe
result_df = df[df.withinQ==True]
# select points outside ecuador, assign to `xresult_df` dataframe
xresult_df = df[df.withinQ==False]

# for checking/visualization, create a plot of relevant geometries
ax1 = ecuador.plot(color='pink')
ax1.scatter(result_df.Longitude, result_df.Latitude, s=50, color='green')
#ax1.scatter(xresult_df.Longitude, xresult_df.Latitude, s=30, color='red')

The plot:

ecuador

For the resulting dataframe result_df, its content will look like this:

         Date  User_ID  Longitude  Latitude  withinQ
0  2020-01-02   824664   -79.8832   -2.1811     True
2  2020-01-15   147835   -80.7804   -1.4845     True

Upvotes: 4

Mel
Mel

Reputation: 331

For future reference you can use the documentation in this link, I found it very helpful!

The process you are looking for is called Point in Polygon and, as the other answer mentions, you can use the function .within()

Now, with what you already have I would do:

#find point in polygon
#code below returns a series with boolean values
#if value is True it means the point in that index location is within the polygon we are evaluating

pip = gdf.within(ec.loc[0, 'geometry'])

#creating a new geoDataFrame that will have only the intersecting records

ec_gdf = gdf.loc[pip].copy()

#resetting index(optional step if you don't need to keep the original index values)
ec_gdf.reset_index(inplace=True, drop=True)

Upvotes: 6

Related Questions