Reputation: 997
How would one go about merging two GeoDataFrame
s on both Point
geometry and arbitrary other columns at the same time? I realize this task is ambiguous for all other geometries than Point
because "equality" is not well defined for Lines and Polygons, but still.
The following MWE throws an error if I simply try gdf2.merge(gdf)
, correctly complaining that
unhashable type: 'Point'.
How does one work around this?
import geopandas as gpd
import pandas as pd
from io import StringIO
import shapely
df = pd.read_csv(StringIO('''
Name Value x y
'a' 1.5 0. 0.
'b' 22 0. 1.
'c' 0.2 0. 1.
'''),sep=r"\s*",engine='python')
df2 = pd.read_csv(StringIO('''
Name OtherValue x y
'a' 9.9 0. 0.
'b' 4.5 0. 1.
'c' 2e3 1. 1.
'''),sep=r"\s*",engine='python')
def dataframe_to_geodataframe(df):
geometry = [shapely.geometry.Point(xy) for xy in zip(df.x, df.y)]
df = df.drop(['x','y'], axis=1)
gdf = gpd.GeoDataFrame(df, geometry=geometry)
return gdf
gdf = dataframe_to_geodataframe(df)
gdf2 = dataframe_to_geodataframe(df2)
gdf.merge(gdf2,how='left')
Output would ideally be something like
Name Value geometry OtherValue
0 'a' 1.5 POINT (0 0) 9.9
1 'b' 22.0 POINT (0 1) 4.5
2 'c' 0.2 POINT (0 1) NaN
(of course depending on the how
keyword).
(I do realize this can be done easily after converting back to ordinary pandas DataFrames, but I feel there should be a way to do this without converting forth and back.)
Upvotes: 2
Views: 1425
Reputation: 13097
One (perhaps dirty) way would be to make the Point
hashable by extending the class shapely.geometry.Point
:
class HPoint(shapely.geometry.Point):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __hash__(self):
return hash(tuple(self.coords))
This is based on the fact that the equality operator for Point
(provided via the parent class BaseGeometry
) just compares the coordinate tuples.
Then you could use this class as:
def dataframe_to_geodataframe(df):
geometry = [HPoint(xy) for xy in zip(df.x, df.y)]
df = df.drop(['x','y'], axis=1)
gdf = gpd.GeoDataFrame(df, geometry=geometry)
return gdf
gdf = dataframe_to_geodataframe(df)
gdf2 = dataframe_to_geodataframe(df2)
print(gdf2.merge(gdf, how='right'))
which yields:
Name OtherValue geometry Value
0 'a' 9.9 POINT (0 0) 1.5
1 'b' 4.5 POINT (0 1) 22.0
2 'c' NaN POINT (0 1) 0.2
Upvotes: 4