gabe
gabe

Reputation: 37

spatial join by geometry and time in python(geopandas)

I'm tryihng to merge 2 geodataframes.(let's say 'df_a' and 'df_b') and I want to join those 2 dfs by time and geometry

In case of 'df_a' it's geometry is 'multipolygon'.

time feature 1 ... geometry
'2017-01-01' 10 ... MULTIPOLYGON (((-35.12334 3.12648, -35.12334 2...
... ... ... ...
'2020-12-31' 4 ... MULTIPOLYGON (((-18.12334 21.11820, -18.12334 ...

In case of 'df_b' it's geometry is 'point'.

time feature 2 ... geometry
'2017-08-01' 1 ... POINT (-35.25000 3.00000)
... ... ... ...
'2020-10-15' 7 ... POINT (-34.25000 3.00000)

As you may recognized, both data frames' time variation is different. Also, 'df_a' is multipolygon type, and 'df_b' has the point type

And I wonder how to join those 2 dfs by the criteria of 'time' and 'geometry' using geopandas in Python?

FYI

  1. both data frame has same WGS(EPSG:4326)
  2. df_a's time variation is wider
  3. every points of 'df_b' are in multipolygon of 'df_a'
  4. some of multipolygons may not contains any points

And, as long as I know, there is no parms that can designate criteria (just like pandas merge) The only thing that I know about the spatial join by the geopandas is

gpd.sjoin(df_a, df_b, how='left', predicate='intersects')

is there any method to do that?

Upvotes: 0

Views: 872

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31226

You can filter the rows after performing sjoin():

gpd.sjoin(df_a, df_b, how="left", predicate="intersects").loc[
    lambda d: d["time_left"].eq(d["time_right"])
]

MWE for generating data sets

import geopandas as gpd
import pandas as pd

# synthesize some geodataframes matching structure in question
df_a = (
    pd.merge(
        pd.Series(pd.date_range("1-jan-2017", "31-dec-2020", freq="15D"), name="time"),
        gpd.read_file(gpd.datasets.get_path("naturalearth_cities"))
        .reset_index()
        .rename(columns={"index": "feature 1"})
        .drop(columns=["name"]),
        how="cross",
    )
    .sample(100)
    .sort_values(["time", "feature 1"])
)
df_a = gpd.GeoDataFrame(df_a)

df_b = (
    pd.merge(
        pd.Series(pd.date_range("1-jan-2017", "31-dec-2020", freq="15D"), name="time"),
        gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
        .reset_index()
        .rename(columns={"index": "feature 2"})
        .drop(columns=["name", "pop_est", "continent", "gdp_md_est"]),
        how="cross",
    )
    # .sample(50)
    .sort_values(["time", "feature 2"])
)
df_b = gpd.GeoDataFrame(df_b)

output

time_left feature 1 geometry index_right time_right feature 2 iso_a3
105 2017-01-01 00:00:00 105 POINT (-88.76707299981655 17.252033507246892) 39 2017-01-01 00:00:00 39 BLZ
120 2017-01-01 00:00:00 120 POINT (18.383001666953305 43.850022398954934) 170 2017-01-01 00:00:00 170 BIH
283 2017-01-16 00:00:00 81 POINT (-89.2049870794599 13.711947505494038) 214 2017-01-16 00:00:00 37 SLV
527 2017-01-31 00:00:00 123 POINT (44.06531001666542 9.56002239881775) 521 2017-01-31 00:00:00 167 -99
600 2017-01-31 00:00:00 196 POINT (-74.08528981377441 4.598369421147822) 386 2017-01-31 00:00:00 32 COL
861 2017-03-02 00:00:00 53 POINT (-86.27043751890119 12.154962438756115) 743 2017-03-02 00:00:00 35 NIC
1001 2017-03-02 00:00:00 193 POINT (116.38633982565943 39.93083808990906) 847 2017-03-02 00:00:00 139 CHN
1161 2017-03-17 00:00:00 151 POINT (-69.90203094331503 18.472018713195382) 902 2017-03-17 00:00:00 17 DOM
1451 2017-04-16 00:00:00 37 POINT (10.179678099212026 36.80277813623144) 1320 2017-04-16 00:00:00 81 TUN
1589 2017-04-16 00:00:00 175 POINT (13.399602764700546 52.523764522251156) 1360 2017-04-16 00:00:00 121 DEU

Upvotes: 1

Related Questions