Reputation: 19
I have a dataframe with coordinates and elevation of 1259 data.
df_elevation
Longitud Latitud Elevación
0 -5.879263 42.579535 937
1 -5.879303 42.579535 937
2 -5.879342 42.579535 937
3 -5.879382 42.579535 937
4 -5.879422 42.579535 937
... ... ... ...
1255 -5.880498 42.582213 933
1256 -5.880538 42.582213 933
1257 -5.880578 42.582213 933
1258 -5.880618 42.582213 933
1259 -5.880657 42.582213 933
1260 rows × 3 columns
I have a list that makes up a polygon of coordinates.
lat_list = [42.582213356031694, 42.57966169458114, 42.57945629314298, 42.582142258520136, 42.582213356031694]
lon_list = [-5.880088806152344, -5.880657434463501, -5.879863500595092, -5.879262685775757, -5.880088806152344]
I want to select only the data from the dataframe that is inside this polygon, or delete the data from the dataframe that is outside the polygon
Upvotes: 0
Views: 160
Reputation: 3096
If you have a large dataset, I suggest using a spatial index as it will greatly reduce processing time. Geopandas has a slick implementation of the R-tree spatial index which is explained very nicely by Geoff Boeing
Here is an example that expands upon @RJ's answer. First we'll build the df again.
from shapely.geometry import Point, Polygon
import pandas as pd
data = [ { "ID": 0, "Longitud": -5.879263, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 1, "Longitud": -5.879303, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 2, "Longitud": -5.879342, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 3, "Longitud": -5.879382, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 4, "Longitud": -5.879422, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 1255, "Longitud": -5.880498, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1256, "Longitud": -5.880538, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1257, "Longitud": -5.880578, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1258, "Longitud": -5.880618, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1259, "Longitud": -5.880657, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1260, "Longitud": -5.879323515030888, "Latitud": 42.58192907018969, "Elevación": 933 }, { "ID": 1261, "Longitud": -5.879799662054768, "Latitud": 42.58143025825665, "Elevación": 933 }, { "ID": 1262, "Longitud": -5.880003215470649, "Latitud": 42.58117728748368, "Elevación": 933 } ]
df = pd.DataFrame(data)
df = df.set_index('ID')
lat_list = [42.582213356031694, 42.57966169458114, 42.57945629314298, 42.582142258520136, 42.582213356031694]
lon_list = [-5.880088806152344, -5.880657434463501, -5.879863500595092, -5.879262685775757, -5.880088806152344]
polygon = Polygon(zip(lon_list, lat_list))
Next, we will create a geodataframe using gpd's points_from_xy().
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Longitud'], df['Latitud']))
To demonstrate the time savings a spatial index gives, we can expand our geodataframe in a dummy fashion. We create a list of our gdf, and then us pd.concat() so we have a much larger geodataframe. This gives us 130,000 rows rather than 13.
gdf_list = [gdf] * 10000
gdf_cat = pd.concat(gdf_list)
Finally, we create the spatial index, and then use it to return the rows with points inside of the polygon. Note that using the timeit%% magic command in Jupyter can cause variables fail to be saved.
%%timeit
spatial_index = gdf_cat.sindex
possible_matches_index = list(spatial_index.intersection(polygon.bounds))
possible_matches = gdf_cat.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(polygon)]
Using the timit magic command in Jupyter we can see about a 3x speedup over the apply method using a spatial index.
1.95 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Here is the check_polygon function used with apply() and an expansion of the original df.
def check_polygon(row):
return Point(row['Longitud'], row['Latitud']).within(polygon)
df_list = [df] * 10000
df_cat = pd.concat(df_list)
After expanding we demonstrate speed without a spatial index.
%%timeit
df_cat['inpolygon'] = df_cat.apply(check_polygon, axis=1)
df_cat_slice = df_cat[df_cat['inpolygon'] == True]
And see it's quite a bit slower.
6.23 s ± 320 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 1
Reputation: 9649
You can use shapely
to create a points and polygons and then check whether a point is in a polygon with within
. In this example I'm running it through a function that creates an extra column indicating whether the point is in the polygon or not. Then you can filter the df on that. Note that I added some sample data because none of the points in your sample df are actually in the polygon:
from shapely.geometry import Point, Polygon
import pandas as pd
data = [ { "ID": 0, "Longitud": -5.879263, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 1, "Longitud": -5.879303, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 2, "Longitud": -5.879342, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 3, "Longitud": -5.879382, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 4, "Longitud": -5.879422, "Latitud": 42.579535, "Elevación": 937 }, { "ID": 1255, "Longitud": -5.880498, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1256, "Longitud": -5.880538, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1257, "Longitud": -5.880578, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1258, "Longitud": -5.880618, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1259, "Longitud": -5.880657, "Latitud": 42.582213, "Elevación": 933 }, { "ID": 1260, "Longitud": -5.879323515030888, "Latitud": 42.58192907018969, "Elevación": 933 }, { "ID": 1261, "Longitud": -5.879799662054768, "Latitud": 42.58143025825665, "Elevación": 933 }, { "ID": 1262, "Longitud": -5.880003215470649, "Latitud": 42.58117728748368, "Elevación": 933 } ]
df = pd.DataFrame(data)
df = df.set_index('ID')
lat_list = [42.582213356031694, 42.57966169458114, 42.57945629314298, 42.582142258520136, 42.582213356031694]
lon_list = [-5.880088806152344, -5.880657434463501, -5.879863500595092, -5.879262685775757, -5.880088806152344]
polygon = Polygon(zip(lon_list, lat_list))
def check_polygon(row):
return Point(row['Longitud'], row['Latitud']).within(polygon)
df['inpolygon'] = df.apply(check_polygon, axis=1)
df = df[df['inpolygon'] == True]
Output:
ID | Longitud | Latitud | Elevación | inpolygon |
---|---|---|---|---|
1260 | -5.87932 | 42.5819 | 933 | True |
1261 | -5.8798 | 42.5814 | 933 | True |
1262 | -5.88 | 42.5812 | 933 | True |
Upvotes: 2
Reputation: 1
I would first create a shapely polygon from the coordinates using shapely.geometry.Polygon and the i would also convert all the coordinates into shapely.geometry.Point objects and use the contains() method to see which points are inside your polygon. Then you simply index out the rest. You can do this also using geopandas but it's optional
You can see how the contains() method works here
Upvotes: 0