doppler
doppler

Reputation: 997

Merging GeoDataFrames on both Point geometry and other columns at the same time

How would one go about merging two GeoDataFrames on both Point geometry and arbitrary other columns at the same time? I realize this task is ambiguous for all other geometries than Point because "equality" is not well defined for Lines and Polygons, but still.

The following MWE throws an error if I simply try gdf2.merge(gdf), correctly complaining that

unhashable type: 'Point'.

How does one work around this?

import geopandas as gpd
import pandas as pd
from io import StringIO
import shapely

df = pd.read_csv(StringIO('''
Name Value x y
'a' 1.5 0. 0.
'b' 22  0. 1.
'c' 0.2 0. 1.
'''),sep=r"\s*",engine='python')

df2 = pd.read_csv(StringIO('''
Name OtherValue x y
'a' 9.9 0. 0.
'b' 4.5 0. 1.
'c' 2e3 1. 1.
'''),sep=r"\s*",engine='python')

def dataframe_to_geodataframe(df):
    geometry = [shapely.geometry.Point(xy) for xy in zip(df.x, df.y)]
    df = df.drop(['x','y'], axis=1)
    gdf = gpd.GeoDataFrame(df, geometry=geometry)
    return gdf

gdf = dataframe_to_geodataframe(df)
gdf2 = dataframe_to_geodataframe(df2)

gdf.merge(gdf2,how='left')

Output would ideally be something like

  Name  Value     geometry OtherValue
0  'a'    1.5  POINT (0 0)        9.9
1  'b'   22.0  POINT (0 1)        4.5
2  'c'    0.2  POINT (0 1)        NaN

(of course depending on the how keyword).

(I do realize this can be done easily after converting back to ordinary pandas DataFrames, but I feel there should be a way to do this without converting forth and back.)

Upvotes: 2

Views: 1425

Answers (1)

ewcz
ewcz

Reputation: 13097

One (perhaps dirty) way would be to make the Point hashable by extending the class shapely.geometry.Point:

class HPoint(shapely.geometry.Point):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def __hash__(self):
       return hash(tuple(self.coords))

This is based on the fact that the equality operator for Point (provided via the parent class BaseGeometry) just compares the coordinate tuples.

Then you could use this class as:

def dataframe_to_geodataframe(df):
    geometry = [HPoint(xy) for xy in zip(df.x, df.y)]
    df = df.drop(['x','y'], axis=1)
    gdf = gpd.GeoDataFrame(df, geometry=geometry)
    return gdf

gdf = dataframe_to_geodataframe(df)
gdf2 = dataframe_to_geodataframe(df2)

print(gdf2.merge(gdf, how='right'))

which yields:

  Name  OtherValue     geometry  Value
0  'a'         9.9  POINT (0 0)    1.5
1  'b'         4.5  POINT (0 1)   22.0
2  'c'         NaN  POINT (0 1)    0.2

Upvotes: 4

Related Questions