Ismail
Ismail

Reputation: 288

Conditional date join in python Pandas

I have two pandas dataframes matches with columns (match_id, team_id,date, ...) and teams_att with columns (id, team_id, date, overall_rating, ...). I want to join the two dataframes on matches.team_id = teams_att.team_id and teams_att.date closest to matches.date

Example

matches

match_id    team_id     date
1           101         2012-05-17
2           101         2014-07-11
3           102         2010-05-21
4           102         2017-10-24

teams_att

id  team_id     date        overall_rating
1   101         2010-02-22  67
2   101         2011-02-22  69
3   101         2012-02-20  73
4   101         2013-09-17  79
5   101         2014-09-10  74
6   101         2015-08-30  82
7   102         2015-03-21  42
8   102         2016-03-22  44

Desired results

match_id    team_id     matches.date    teams_att.date      overall_rating
1           101         2012-05-17      2012-02-20          73
2           101         2014-07-11      2014-09-10          74
3           102         2010-05-21      2015-03-21          42
4           102         2017-10-24      2016-03-22          44

Upvotes: 2

Views: 81

Answers (2)

BENY
BENY

Reputation: 323396

We using merge_asof (Please check Scott's answer, that is the right way for solving this type problem :-) cheers )

g1=df1.groupby('team_id')
g=df.groupby('team_id')

l=[]
for x in [101,102]:
   l.append(pd.merge_asof(g.get_group(x),g1.get_group(x),on='date',direction ='nearest'))


pd.concat(l)
Out[405]: 
   match_id  team_id_x       date  id  team_id_y  overall_rating
0         1        101 2012-05-17   3        101              73
1         2        101 2014-07-11   5        101              74
0         3        102 2010-05-21   7        102              42
1         4        102 2017-10-24   8        102              44

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153560

You can use merge_asof with by and direction parameters:

pd.merge_asof(matches.sort_values('date'), 
              teams_att.sort_values('date'), 
              on='date', by='team_id', 
              direction='nearest')

Output:

   match_id  team_id       date  id  overall_rating
0         3      102 2010-05-21   7              42
1         1      101 2012-05-17   3              73
2         2      101 2014-07-11   5              74
3         4      102 2017-10-24   8              44

Upvotes: 2

Related Questions