Reputation: 10033
I have this df1:
df1 = pd.DataFrame({'Player':['Zico', 'Leonidas', 'Didi'],
'Team': ['Flamengo', 'Flamengo', 'Botafogo'],
'Position': ['MID', 'DEF', 'MID'],
'Games_Away': [4, 4, 4]})
And another df2 with a different numbers of rows, where all players on the first df are present.
df2 = pd.DataFrame({'Player':['Zico', 'Leonidas', 'Didi', 'Gerson', 'Pele'],
'Team': ['Flamengo', 'Flamengo', 'Botafogo', 'Botafogo', 'Santos'],
'Position': ['MID', 'DEF', 'MID', 'MID', 'FWD'],
'Games_Home': [3, 4, 3, 1, 1]})
How do I merge these two dfs, in order to end up with a new column 'Total_Games' that will add up correctly?
Player Team Position Games_Home Games_Away Total_Games
0 Zico Flamengo MID 3 4 7
1 Leonidas Flamengo DEF 4 4 8
2 Didi Botafogo MID 3 4 7
3 Gerson Botafogo MID 1 0 1
4 Pele Santos FWD 1 0 1
I have tried:
df_merge = df1.merge(df2, on="Player", how = 'inner')
df_merge['Total_Games']= df1['Games_Away'] + df2['Games_Home']
But this gives me:
Player Team_x Position_x Games_Away Team_y Position_y Games_Home Total_Games
0 Zico Flamengo MID 4 Flamengo MID 3 7.0
1 Leonidas Flamengo DEF 4 Flamengo DEF 4 8.0
2 Didi Botafogo MID 4 Botafogo MID 3 7.0
Problems:
What is the best way of joining the dfs, keeping all players, not having duplicate columns and summing correctly 'Games_Away' + 'Games_Home', given the fact that 1 + 'NaN' must be 1?
Upvotes: 0
Views: 41
Reputation: 323226
Do outer merge
df=df1.merge(df2,on=['Player','Team','Position'],how='outer').fillna(0)
df['Game_total']=df.Games_Away+df.Games_Home
df
Out[241]:
Player Team Position Games_Away Games_Home Game_total
0 Zico Flamengo MID 4.0 3 7.0
1 Leonidas Flamengo DEF 4.0 4 8.0
2 Didi Botafogo MID 4.0 3 7.0
3 Gerson Botafogo MID 0.0 1 1.0
4 Pele Santos FWD 0.0 1 1.0
Upvotes: 1