jjdblast
jjdblast

Reputation: 535

Python merge/join two dataframe by geometric condition

I have a GeoDataFrame of Points df_points (~50k) and a GeoDataFrame of Polygons df_polygons (~50k).

I'm looking to merge 2 dataframes by keeping columns in df_points, and match columns in df_polygons by condition that whether the point is present in the polygon.

import geopandas as gpd
from shapely.geometry import Point, Polygon

_polygons = [ Polygon([(5, 5), (5, 13), (13, 13), (13, 5)]), Polygon([(10, 10), (10, 15), (15, 15), (15, 10)]) ]
_pnts = [Point(3, 3), Point(8, 8), Point(11, 11)]
df_polygons = gpd.GeoDataFrame(geometry=_polygons, index=['foo', 'bar']).reset_index()
df_points = gpd.GeoDataFrame(geometry=_pnts, index=['A', 'B', 'C']).reset_index()

df_points looks like:

> df_points
    index   geometry
0   A       POINT (3.00000 3.00000)
1   B       POINT (8.00000 8.00000)
2   C       POINT (11.00000 11.00000)

df_polygons looks like:

> df_polygons
    index   geometry
0   foo     POLYGON ((5.00000 5.00000, 5.00000 13.00000, 1...
1   bar     POLYGON ((10.00000 10.00000, 10.00000 15.00000...

the result may looks like:

    index   geometry_points            geometry_index   geometry_polygons
0   A       POINT (3.00000 3.00000)    []               []
1   B       POINT (8.00000 8.00000)    ['foo']          [Polygon([(5, 5), (5, 13), (13, 13), (13, 5)])]
2   C       POINT (11.00000 11.00000)  ['foo','bar']    [Polygon([(5, 5), (5, 13), (13, 13), (13, 5)]), Polygon([(10, 10), (10, 15), (15, 15), (15, 10)]]

Is there anyway to merge dataframes efficiently?

Upvotes: 0

Views: 989

Answers (1)

Corralien
Corralien

Reputation: 120509

Use spatial join (gpd.sjoin):

# Rename 'index' columns to avoid FutureWarning
dfp = df_points.rename(columns={'index': 'point'})
dfa = df_polygons.rename(columns={'index': 'area'})

# Find points within polygons
out = gpd.sjoin(dfp, dfa, how='inner', op='within')

# Reduce rows
out = out.groupby('point') \
         .agg({'area': lambda x: x.tolist() if x.any() else [],
               'index_right': lambda x: dfa.loc[x, 'geometry'].tolist()
                                            if ~x.all() else []}) \
         .reset_index()

# Append columns
dfp = dfp.merge(out, on='point')

Output:

>>> dfp
  point                   geometry        area                                        index_right
0     A    POINT (3.00000 3.00000)          []                                                 []
1     B    POINT (8.00000 8.00000)       [foo]          [POLYGON ((5 5, 5 13, 13 13, 13 5, 5 5))]
2     C  POINT (11.00000 11.00000)  [foo, bar]  [POLYGON ((5 5, 5 13, 13 13, 13 5, 5 5)), POLY...

Upvotes: 3

Related Questions