Headshaker
Headshaker

Reputation: 61

Find non-overlapping polygons in GeoDataFrame

I have a GeoDataFrame with a column of shapely.polygons. Some of them distinct, some - not:

In [1]: gdf
Out[2]:
    geometry
1   POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))
2   POLYGON ((1 3, 1 4, 2 4, 2 3, 1 3))
3   POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))
4   POLYGON ((3 1, 3 2, 4 2, 4 1, 3 1))
5   POLYGON ((1 3, 1 4, 2 4, 2 3, 1 3))

I need to find only distinct (non-overlapping) polygons:

In [1]: gdf_distinct
Out[2]:
    geometry
1   POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))
2   POLYGON ((1 3, 1 4, 2 4, 2 3, 1 3))
4   POLYGON ((3 1, 3 2, 4 2, 4 1, 3 1))

As polygons are not hashable, I can not use simple ways in Pandas:

In [1]: gdf_distinct = gdf['geometry'].unique()

TypeError: unhashable type: 'Polygon'

Are there any simple and efficient ways to have a new GeoDataFrame with only distinct polygons?

P.S.:

I found one way, but it works only with fully-duplicate polygons and, as I think, not very efficient:

In [1]: m = []
        for index, row in gdf.iterrows():]
           if row['geometry'] not in m:
              m.append(row['geometry'])
        gdf_distinct = GeoDataFrame(geometry=m)

Upvotes: 5

Views: 2823

Answers (2)

Wilson Sauthoff
Wilson Sauthoff

Reputation: 480

Thanks @Paul H for the great answer and @alphabetasoup for the thoughtful comment. My solution builds upon this answer for my use case, which is finding the overlapping polygons.

# Find polygons in a geopandas dataframe that overlap with another polygon 
# in the same dataframe as well as non-overlapping polygons
overlapping = []
non_overlapping = []
for n, p in enumerate(list(gdf.geometry), 0):
    overlaps = []
    for g in list(gdf.geometry):
        overlaps.append(g.overlaps(p))
    if any(overlaps):
        overlapping.append(p)  
    if not any(overlaps):
        non_overlapping.append(p)

My use case also required retaining other columns from the original geopandas geodataframe. Here's how I did that:

overlapping = []
non_overlapping = []
for n, p in enumerate(list(gdf.geometry), 0):
    if any(p.overlaps(g) for g in list(gdf.geometry)):
        # Store the index from the original dataframe
        overlapping.append(n)
    if not any(p.overlaps(g) for g in list(gdf.geometry)):
        non_overlapping.append(n)

# Create a new dataframes and reset their indexes
gdf_overlapping = gdf.iloc[overlapping]  
gdf_overlapping.reset_index(drop=True, inplace=True)
gdf_non_overlapping = gdf.iloc[non_overlapping]
gdf_non_overlapping.reset_index(drop=True, inplace=True)

For this I made a few code modifications from the original answer:

  1. I found that I needed to include the last element so that I didn't lose one of the overlapping polygons: for n, p in enumerate(polygons[:-1], 1): -> polygons[:] = polygons
  2. I used my geopandas geodataframe list(gdf.geometry) instead of a list of polygons
  3. I used Python's zero-based indexing to retain the index location of each polygon in my original geopandas geodataframe for n, p in enumerate(polygons[:-1], 1): -> for n, p in enumerate(polygons[:-1], 0):
  4. I changed checking polygons from the next indexed element to the end to all the polygons for g in polygons[n:]: -> for g in polygons: since the former method missed some overlapping polygons because it would not check the current polygon against all of the polygons (just the polygons at that particular index to the end)
  5. I did not store the polygon as a string, but instead stored the polygon itself non_overlapping.append(str(p)) -> non_overlapping.append(p); then later stored polygon's index to later slice the original geodataframe: non_overlapping.append(p) -> non_overlapping.append(n)

Upvotes: 0

Paul H
Paul H

Reputation: 68146

Let's start with a list of 4 polygons, three of which overlap other polygons:

from shapely.geometry import Polygon
import geopandas

polygons = [
    Polygon([[1, 1], [1, 3], [3, 3], [3, 1], [1, 1]]),
    Polygon([[1, 3], [1, 5], [3, 5], [3, 3], [1, 3]]),
    Polygon([[2, 2], [2, 3.5], [3.5, 3.5], [3.5, 2], [2, 2]]),
    Polygon([[3, 1], [3, 2], [4, 2], [4, 1], [3, 1]]),
]
gdf = geopandas.GeoDataFrame(data={'A': list('ABCD')}, geometry=polygons)
gdf.plot(column='A', alpha=0.75)

They look like this:

enter image description here

So we can loop through each one, then loop through all of the others and check for overlaps with the shapely API. If there are not any overlaps, we'll append it to our output list:

non_overlapping = []
for p in polygons:
    overlaps = []
    for g in filter(lambda g: not g.equals(p), polygons):
        overlaps.append(g.overlaps(p))

    if not any(overlaps):
        non_overlapping.append(p)

Any that gives me:

['POLYGON ((3 1, 3 2, 4 2, 4 1, 3 1))']

Which is what I'd expect.

But this is effectively O(N^2), and I don't think it has to be.

So let's try to never check the same pair twice:

non_overlapping = []
for n, p in enumerate(polygons[:-1], 1):  # don't include the last element
    overlaps = []
    for g in polygons[n:]:  # loop from the next element to the end
        overlaps.append(g.overlaps(p))

    if not any(overlaps):
        non_overlapping.append(str(p))

And I get the same result and it's a smidge faster on my machine.

We can compress the loop a bit by using generator in the if statement instead of a normal for block:

non_overlapping = []
for n, p in enumerate(polygons[:-1], 1):
    if not any(p.overlaps(g) for g in polygons[n:]):
        non_overlapping.append(p)

Same story.

Upvotes: 4

Related Questions