Reputation: 105
I have two dataframes that look like this:
#df1
id | minyear
x 01/01/2019
y 01/02/2019
z 01/03/2019
#df2
id2 | date
xx 01/01/2018
xx 01/02/2019
xx 01/01/2020
yy 01/02/2018
yy 01/03/2019
yy 01/04/2020
zz 01/01/2019
zz 01/05/2020
What I want to do is if the id column is a substring of id2 then I would like to to only keep the values in df2 that are greater than its minyear value. So the final dataframe should look like this:
id2 | date
xx 01/02/2019
xx 01/01/2020
yy 01/03/2019
yy 01/04/2020
zz 01/05/2020
Upvotes: 1
Views: 33
Reputation: 195428
Try:
# convert the columns to datetime (skip if they are converted already):
df1.minyear = pd.to_datetime(df1.minyear)
df2.date = pd.to_datetime(df2.date)
x = df1.merge(df2, how="cross")
x["tmp"] = x.apply(lambda r: r["id"] in r["id2"], axis=1)
x = x[x.tmp & x.date.ge(x.minyear)][["id2", "date"]]
print(x)
Prints:
id2 date
1 xx 2019-01-02
2 xx 2020-01-01
12 yy 2019-01-03
13 yy 2020-01-04
23 zz 2020-01-05
Upvotes: 1