Reputation: 368
I have two pandas data frames df1
and df2
like the following:
df1
containing all the data of type a
in increasing time order:
type Date
0 a 1970-01-01
1 a 2008-08-01
2 a 2009-07-24
3 a 2010-09-30
4 a 2011-09-29
5 a 2013-06-11
6 a 2013-12-17
7 a 2015-06-02
8 a 2016-06-14
9 a 2017-06-21
10 a 2018-11-26
11 a 2019-06-03
12 a 2019-12-16
df2
containing all the data of type b
in increasing time order:
type Date
0 b 2017-11-29
1 b 2018-05-30
2 b 2018-11-26
3 b 2019-06-03
4 b 2019-12-16
5 b 2020-06-18
6 b 2020-12-17
7 b 2021-06-28
A type a
entry and a type b
entry are determined as matching if the date difference between them is within one year. One type a
entry can only match with one other type b
entry, and vice versa. Time efficiently, how can I find the maximum amount of matching pairs in increasing time order like the following?
type1 Date1 type2 Date2
0 a 2017-06-21 b 2017-11-29
1 a 2018-11-26 b 2018-05-30
2 a 2019-06-03 b 2018-11-26
3 a 2019-12-16 b 2019-06-03
Upvotes: 1
Views: 42
Reputation: 862591
Use merge_asof
:
df3 = pd.merge_asof(df1.rename(columns={'Date':'Date2', 'type':'type1'}),
df2.rename(columns={'Date':'Date1', 'type':'type2'}),
left_on='Date2',
right_on='Date1',
direction='nearest',
allow_exact_matches=False,
tolerance=pd.Timedelta('365 days')).dropna(subset=['Date1'])
print (df3)
type1 Date2 type2 Date1
9 a 2017-06-21 b 2017-11-29
10 a 2018-11-26 b 2018-05-30
11 a 2019-06-03 b 2018-11-26
12 a 2019-12-16 b 2020-06-18
Upvotes: 1