Reputation: 61
I have a GeoDataFrame with a column of shapely.polygons. Some of them distinct, some - not:
In [1]: gdf
Out[2]:
geometry
1 POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))
2 POLYGON ((1 3, 1 4, 2 4, 2 3, 1 3))
3 POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))
4 POLYGON ((3 1, 3 2, 4 2, 4 1, 3 1))
5 POLYGON ((1 3, 1 4, 2 4, 2 3, 1 3))
I need to find only distinct (non-overlapping) polygons:
In [1]: gdf_distinct
Out[2]:
geometry
1 POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))
2 POLYGON ((1 3, 1 4, 2 4, 2 3, 1 3))
4 POLYGON ((3 1, 3 2, 4 2, 4 1, 3 1))
As polygons are not hashable, I can not use simple ways in Pandas:
In [1]: gdf_distinct = gdf['geometry'].unique()
TypeError: unhashable type: 'Polygon'
Are there any simple and efficient ways to have a new GeoDataFrame with only distinct polygons?
P.S.:
I found one way, but it works only with fully-duplicate polygons and, as I think, not very efficient:
In [1]: m = []
for index, row in gdf.iterrows():]
if row['geometry'] not in m:
m.append(row['geometry'])
gdf_distinct = GeoDataFrame(geometry=m)
Upvotes: 5
Views: 2823
Reputation: 480
Thanks @Paul H for the great answer and @alphabetasoup for the thoughtful comment. My solution builds upon this answer for my use case, which is finding the overlapping polygons.
# Find polygons in a geopandas dataframe that overlap with another polygon
# in the same dataframe as well as non-overlapping polygons
overlapping = []
non_overlapping = []
for n, p in enumerate(list(gdf.geometry), 0):
overlaps = []
for g in list(gdf.geometry):
overlaps.append(g.overlaps(p))
if any(overlaps):
overlapping.append(p)
if not any(overlaps):
non_overlapping.append(p)
My use case also required retaining other columns from the original geopandas geodataframe. Here's how I did that:
overlapping = []
non_overlapping = []
for n, p in enumerate(list(gdf.geometry), 0):
if any(p.overlaps(g) for g in list(gdf.geometry)):
# Store the index from the original dataframe
overlapping.append(n)
if not any(p.overlaps(g) for g in list(gdf.geometry)):
non_overlapping.append(n)
# Create a new dataframes and reset their indexes
gdf_overlapping = gdf.iloc[overlapping]
gdf_overlapping.reset_index(drop=True, inplace=True)
gdf_non_overlapping = gdf.iloc[non_overlapping]
gdf_non_overlapping.reset_index(drop=True, inplace=True)
For this I made a few code modifications from the original answer:
for n, p in enumerate(polygons[:-1], 1):
-> polygons[:]
= polygons
list(gdf.geometry)
instead of a list of polygonsfor n, p in enumerate(polygons[:-1], 1):
-> for n, p in enumerate(polygons[:-1], 0):
for g in polygons[n:]:
-> for g in polygons:
since the former method missed some overlapping polygons because it would not check the current polygon against all of the polygons (just the polygons at that particular index to the end)non_overlapping.append(str(p))
-> non_overlapping.append(p)
; then later stored polygon's index to later slice the original geodataframe: non_overlapping.append(p)
-> non_overlapping.append(n)
Upvotes: 0
Reputation: 68146
Let's start with a list of 4 polygons, three of which overlap other polygons:
from shapely.geometry import Polygon
import geopandas
polygons = [
Polygon([[1, 1], [1, 3], [3, 3], [3, 1], [1, 1]]),
Polygon([[1, 3], [1, 5], [3, 5], [3, 3], [1, 3]]),
Polygon([[2, 2], [2, 3.5], [3.5, 3.5], [3.5, 2], [2, 2]]),
Polygon([[3, 1], [3, 2], [4, 2], [4, 1], [3, 1]]),
]
gdf = geopandas.GeoDataFrame(data={'A': list('ABCD')}, geometry=polygons)
gdf.plot(column='A', alpha=0.75)
They look like this:
So we can loop through each one, then loop through all of the others and check for overlaps with the shapely
API. If there are not any overlaps, we'll append it to our output list:
non_overlapping = []
for p in polygons:
overlaps = []
for g in filter(lambda g: not g.equals(p), polygons):
overlaps.append(g.overlaps(p))
if not any(overlaps):
non_overlapping.append(p)
Any that gives me:
['POLYGON ((3 1, 3 2, 4 2, 4 1, 3 1))']
Which is what I'd expect.
But this is effectively O(N^2), and I don't think it has to be.
So let's try to never check the same pair twice:
non_overlapping = []
for n, p in enumerate(polygons[:-1], 1): # don't include the last element
overlaps = []
for g in polygons[n:]: # loop from the next element to the end
overlaps.append(g.overlaps(p))
if not any(overlaps):
non_overlapping.append(str(p))
And I get the same result and it's a smidge faster on my machine.
We can compress the loop a bit by using generator in the if
statement instead of a normal for
block:
non_overlapping = []
for n, p in enumerate(polygons[:-1], 1):
if not any(p.overlaps(g) for g in polygons[n:]):
non_overlapping.append(p)
Same story.
Upvotes: 4