Reputation: 37
I'm tryihng to merge 2 geodataframes.(let's say 'df_a' and 'df_b') and I want to join those 2 dfs by time and geometry
In case of 'df_a' it's geometry is 'multipolygon'.
time | feature 1 | ... | geometry |
---|---|---|---|
'2017-01-01' | 10 | ... | MULTIPOLYGON (((-35.12334 3.12648, -35.12334 2... |
... | ... | ... | ... |
'2020-12-31' | 4 | ... | MULTIPOLYGON (((-18.12334 21.11820, -18.12334 ... |
In case of 'df_b' it's geometry is 'point'.
time | feature 2 | ... | geometry |
---|---|---|---|
'2017-08-01' | 1 | ... | POINT (-35.25000 3.00000) |
... | ... | ... | ... |
'2020-10-15' | 7 | ... | POINT (-34.25000 3.00000) |
As you may recognized, both data frames' time variation is different. Also, 'df_a' is multipolygon type, and 'df_b' has the point type
And I wonder how to join those 2 dfs by the criteria of 'time' and 'geometry' using geopandas in Python?
FYI
And, as long as I know, there is no parms that can designate criteria (just like pandas merge) The only thing that I know about the spatial join by the geopandas is
gpd.sjoin(df_a, df_b, how='left', predicate='intersects')
is there any method to do that?
Upvotes: 0
Views: 872
Reputation: 31226
You can filter the rows after performing sjoin()
:
gpd.sjoin(df_a, df_b, how="left", predicate="intersects").loc[
lambda d: d["time_left"].eq(d["time_right"])
]
import geopandas as gpd
import pandas as pd
# synthesize some geodataframes matching structure in question
df_a = (
pd.merge(
pd.Series(pd.date_range("1-jan-2017", "31-dec-2020", freq="15D"), name="time"),
gpd.read_file(gpd.datasets.get_path("naturalearth_cities"))
.reset_index()
.rename(columns={"index": "feature 1"})
.drop(columns=["name"]),
how="cross",
)
.sample(100)
.sort_values(["time", "feature 1"])
)
df_a = gpd.GeoDataFrame(df_a)
df_b = (
pd.merge(
pd.Series(pd.date_range("1-jan-2017", "31-dec-2020", freq="15D"), name="time"),
gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
.reset_index()
.rename(columns={"index": "feature 2"})
.drop(columns=["name", "pop_est", "continent", "gdp_md_est"]),
how="cross",
)
# .sample(50)
.sort_values(["time", "feature 2"])
)
df_b = gpd.GeoDataFrame(df_b)
time_left | feature 1 | geometry | index_right | time_right | feature 2 | iso_a3 | |
---|---|---|---|---|---|---|---|
105 | 2017-01-01 00:00:00 | 105 | POINT (-88.76707299981655 17.252033507246892) | 39 | 2017-01-01 00:00:00 | 39 | BLZ |
120 | 2017-01-01 00:00:00 | 120 | POINT (18.383001666953305 43.850022398954934) | 170 | 2017-01-01 00:00:00 | 170 | BIH |
283 | 2017-01-16 00:00:00 | 81 | POINT (-89.2049870794599 13.711947505494038) | 214 | 2017-01-16 00:00:00 | 37 | SLV |
527 | 2017-01-31 00:00:00 | 123 | POINT (44.06531001666542 9.56002239881775) | 521 | 2017-01-31 00:00:00 | 167 | -99 |
600 | 2017-01-31 00:00:00 | 196 | POINT (-74.08528981377441 4.598369421147822) | 386 | 2017-01-31 00:00:00 | 32 | COL |
861 | 2017-03-02 00:00:00 | 53 | POINT (-86.27043751890119 12.154962438756115) | 743 | 2017-03-02 00:00:00 | 35 | NIC |
1001 | 2017-03-02 00:00:00 | 193 | POINT (116.38633982565943 39.93083808990906) | 847 | 2017-03-02 00:00:00 | 139 | CHN |
1161 | 2017-03-17 00:00:00 | 151 | POINT (-69.90203094331503 18.472018713195382) | 902 | 2017-03-17 00:00:00 | 17 | DOM |
1451 | 2017-04-16 00:00:00 | 37 | POINT (10.179678099212026 36.80277813623144) | 1320 | 2017-04-16 00:00:00 | 81 | TUN |
1589 | 2017-04-16 00:00:00 | 175 | POINT (13.399602764700546 52.523764522251156) | 1360 | 2017-04-16 00:00:00 | 121 | DEU |
Upvotes: 1