Reputation: 309
I have a dataframe with game matchups by week and a second dataframe with final scores. I want to update the first with scores for each team from the second with condition that game was played in week 1.
df1 = pd.DataFrame([[1,'aa','hh','',''],
[1,'bb','ii','',''],
[2,'cc','jj','',''],
[1,'dd','kk','',''],
[1,'ee','ll','',''],
[1,'ff','mm','',''],
[2,'gg','nn','','']], columns=['week','team1','team2','score1','score2'])
df1
Out[3]:
week team1 team2 score1 score2
0 1 aa hh
1 1 bb ii
2 2 cc jj
3 1 dd kk
4 1 ee ll
5 1 ff mm
6 2 gg nn
df2 = pd.DataFrame([[1,'aa', 24],
[1,'bb', 27],
[2,'cc', 20],
[1,'dd', 7],
[1,'ee', 9],
[1,'ff', 20],
[2,'gg', 0],
[1,'hh', 10],
[1,'ii', 3],
[2,'jj', 21],
[1,'kk', 20],
[1,'ll', 13],
[1,'mm', 19],
[2,'nn', 14]], columns=['week','team','score'])
df2
Out[5]:
week team score
0 1 aa 24
1 1 bb 27
2 2 cc 20
3 1 dd 7
4 1 ee 9
5 1 ff 20
6 2 gg 0
7 1 hh 10
8 1 ii 3
9 2 jj 21
10 1 kk 20
11 1 ll 13
12 1 mm 19
13 2 nn 14
I tried renaming columns to match and used .update
df2.columns = ['week','team1','score1']
df1.update(df2.loc[(df2['week']== 1)])
df1
Out[7]:
week team1 team2 score1 score2
0 1.0 aa hh 24
1 1.0 bb ii 27
2 2.0 cc jj
3 1.0 dd kk 7
4 1.0 ee ll 9
5 1.0 ff mm 20
6 2.0 gg nn
which gives result I was hoping for but actually doesn't work. When I try to rename again to update score2 I see that it is updating with the first rows from df2 instead of matching them with values in df1. I tried merge but it creates new columns rather than update existing ones. I plan to add new games to df1 and perform this update weekly and I want to update rather than create new columns. What is a way to accomplish this?
my desired output for this example is:
df1
Out[28]:
week team1 team2 score1 score2
0 1 aa hh 24 10
1 1 bb ii 27 3
2 2 cc jj
3 1 dd kk 7 20
4 1 ee ll 9 13
5 1 ff mm 20 19
6 2 gg nn
Upvotes: 1
Views: 678
Reputation: 10624
You can do it with merge:
result=df1.merge(df2, left_on='team1', right_on='team').merge(df2, left_on='team2', right_on='team')[['week_x', 'team1', 'team2', 'score_x', 'score_y']]
result.columns=['week', 'team1', 'team2', 'score1', 'score2']
result.loc[result['week']!=1,['score1', 'score2']]=np.nan
print(result)
Output:
week team1 team2 score1 score2
0 1 aa hh 24 10
1 1 bb ii 27 3
2 2 cc jj 20 21
3 1 dd kk 7 20
4 1 ee ll 9 13
5 1 ff mm 20 19
6 2 gg nn 0 14
Upvotes: 0
Reputation: 150805
Let's try map
instead:
to_map = df2[df2.week==1].set_index('team')['score']
to_update = df1.week==1
df1.loc[to_update, 'score1'] = df1.loc[to_update,'team1'].map(to_map)
df1.loc[to_update, 'score2'] = df1.loc[to_update,'team2'].map(to_map)
Output:
week team1 team2 score1 score2
0 1 aa hh 24 10
1 1 bb ii 27 3
2 2 cc jj
3 1 dd kk 7 20
4 1 ee ll 9 13
5 1 ff mm 20 19
6 2 gg nn
Upvotes: 1