Reputation: 61
I have a .csv file which contains some points (longitude, latitude). I converted it to a DataFrame and from DataFrame to a GeoDataFrame with this code:
CSV file:
Date;User ID;Longitude;Latitude
2020-01-02;824664;-79.8831613;-2.1811152000000003
2020-03-01;123456;80.8831613;2.1811
2020-01-15;147835;-80.78035200000001;-1.4845725
Code that I used to transform .csv to gdf:
df = pd.read_csv('datos25.csv', sep=';', low_memory=False, decimal='.')
gdf = geopandas.GeoDataFrame(
df, geometry=geopandas.points_from_xy(df.Longitud, df.Latitud))
Then, I use this code to define my polygon which is a country:
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
ec = world[world.name == 'Ecuador']
Now, what I want to do is that every POINT in gdf, verify if it is in the polygon/country, and in case that it is not, remove that row from the DataFrame
For example, in this case, the second value in geometry column which is:
POINT (80.8831613 2.1811)
The row where this value is should be remove from the dataframe because it's not in the polygon/country
How can I do this?
Upvotes: 6
Views: 12748
Reputation: 18782
The spatial operation within
is needed to identify whether a point geometry is located within a polygon geometry. In the code below, all the necessary steps are performed towards the goal of identifying all points that fall within a polygon (Ecuador). At the final step, a plot is created to visualize/check the result.
import pandas as pd
import geopandas
from shapely.geometry import Point #Polygon
df = pd.read_csv('ecuador_data.csv', sep=';', low_memory=False, decimal='.')
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
ecuador = world[world.name == 'Ecuador']
# add new column to df
df['withinQ'] = ""
withinQlist = []
for lon,lat in zip(df.Longitude, df.Latitude):
pt = Point(lon, lat)
withinQ = pt.within(ecuador['geometry'].values[0])
#print( withinQ )
withinQlist.append(withinQ)
# update values in the that column, values: True/False
df['withinQ'] = withinQlist
# uncomment next line to see content of `df`
#print(df)
# Date User_ID Longitude Latitude withinQ
# 0 2020-01-02 824664 -79.8832 -2.1811 True
# 1 2020-03-01 123456 80.8832 2.1811 False
# 2 2020-01-15 147835 -80.7804 -1.4845 True
# select points within ecuador, assign to `result_df` dataframe
result_df = df[df.withinQ==True]
# select points outside ecuador, assign to `xresult_df` dataframe
xresult_df = df[df.withinQ==False]
# for checking/visualization, create a plot of relevant geometries
ax1 = ecuador.plot(color='pink')
ax1.scatter(result_df.Longitude, result_df.Latitude, s=50, color='green')
#ax1.scatter(xresult_df.Longitude, xresult_df.Latitude, s=30, color='red')
The plot:
For the resulting dataframe result_df
, its content will look like this:
Date User_ID Longitude Latitude withinQ
0 2020-01-02 824664 -79.8832 -2.1811 True
2 2020-01-15 147835 -80.7804 -1.4845 True
Upvotes: 4
Reputation: 331
For future reference you can use the documentation in this link, I found it very helpful!
The process you are looking for is called Point in Polygon and, as the other answer mentions, you can use the function .within()
Now, with what you already have I would do:
#find point in polygon
#code below returns a series with boolean values
#if value is True it means the point in that index location is within the polygon we are evaluating
pip = gdf.within(ec.loc[0, 'geometry'])
#creating a new geoDataFrame that will have only the intersecting records
ec_gdf = gdf.loc[pip].copy()
#resetting index(optional step if you don't need to keep the original index values)
ec_gdf.reset_index(inplace=True, drop=True)
Upvotes: 6