Chester Cheng
Chester Cheng

Reputation: 158

unmelt pandas dataframe and create new columns for duplicate columns

I have a dataset consisting sports teams performance across the seasons, from which I have created a melted dataset separating the home and away teams.

Original dataset:

  Div      Date  HomeTeam   AwayTeam  FTHG  FTAG FTR  HTHG  HTAG HTR   Referee
0  E0  11/08/17   Arsenal  Leicester     4     3   H     2     2   D    M Dean
1  E0  12/08/17  Brighton   Man City     0     2   A     0     0   D  M Oliver
2  E0  12/08/17   Chelsea    Burnley     2     3   A     0     3   A  C Pawson

Melted dataset:

  Div        Date        HomeTeam      AwayTeam  FTHG  FTAG FTR  HTHG  HTAG  \
0  E0  2017-08-11         Arsenal     Leicester     4     3   H     2     2   
1  E0  2017-08-11         Arsenal     Leicester     4     3   H     2     2   
2  E0  2017-08-12         Watford     Liverpool     3     3   D     2     1   
3  E0  2017-08-12       West Brom   Bournemouth     1     0   H     1     0   
4  E0  2017-08-12  Crystal Palace  Huddersfield     0     3   A     0     2   

  HTR   Referee  Home/Away          Team        Opponent  
0   D    M Dean          1       Arsenal       Leicester  
1   D    M Dean          0     Leicester         Arsenal  
2   H  A Taylor          0     Liverpool         Watford  
3   H  R Madley          0   Bournemouth       West Brom  
4   A    J Moss          0  Huddersfield  Crystal Palace  

I have also added extra columns to calculate cumulative goals/ whatnot. So realistically, it looks like

         Date  Home/Away          Team        Opponent  Cumg  Cumc  Result  \
0  2017-08-11          1       Arsenal       Leicester   0.0   0.0       1   
1  2017-08-11          0     Leicester         Arsenal   0.0   0.0       1   
2  2017-08-12          0     Liverpool         Watford   0.0   0.0       2   
3  2017-08-12          0   Bournemouth       West Brom   0.0   0.0       1   
4  2017-08-12          0  Huddersfield  Crystal Palace   0.0   0.0       0   

   Cumw  Cuml  Cumd  Cumtr  win_streak  lose_streak  
0   0.0   0.0   0.0    0.0         0.0          0.0  
1   0.0   0.0   0.0    0.0         0.0          0.0  
2   0.0   0.0   0.0    0.0         0.0          0.0  
3   0.0   0.0   0.0    0.0         0.0          0.0  
4   0.0   0.0   0.0    0.0         0.0          0.0  

I would like to "unmelt" the dataset by returning to the original format, but keeping the new columns i have added. Like this

         Date  Home/Away     Team     Opponent  Cumg_team  Cumc_team  Result  \
0  2017-08-11          1  Arsenal    Leicester        0.0        0.0       1   
1  2017-08-19          0  Arsenal        Stoke        3.0        4.0       1   
2  2017-08-27          0  Arsenal    Liverpool        3.0        5.0       1   
3  2017-09-09          1  Arsenal  Bournemouth        3.0        9.0       1   
4  2017-09-17          0  Arsenal      Chelsea        3.0       12.0       2  

   Cumw_team  Cuml_team  Cumd  Cumtr_team  win_streak_team  lose_streak_team  \
0        0.0        0.0   0.0         0.0              0.0               0.0   
1        0.0        1.0   0.0         0.0              0.0               0.0   
2        0.0        2.0   0.0         0.0              1.0               0.0   
3        0.0        3.0   0.0         0.0              0.0               0.0   
4        0.0        4.0   0.0         0.0              0.0               0.0  

   Cumw_opponent  Cuml_opponent  Cumg_opponent  Cumc_opponent  Cumtr_opponent  \
0            0.0            0.0            0.0            0.0             0.0   
1            0.0            1.0            0.0            1.0             0.0   
2            0.0            1.0            3.0            4.0             1.0   
3            2.0            1.0            4.0            2.0             6.0   
4            3.0            1.0            7.0            6.0             9.0  

   win_streak_opponent  lose_streak_opponent  
0                  0.0                   0.0  
1                  0.0                   0.0  
2                  0.0                   0.0  
3                  0.0                   0.0  
4                  0.0                   0.0

I could do this by

df1 = df[df['Team']=='Arsenal'].set_index('Date')
df2 = df[df['Opponent']=='Arsenal].set_index('Date')

df3 = df1.join(df2).reset_index()

But that's not very efficient, i wonder if there is anyway i can do this with purej SQL-like manipulation of the pandas df?

Upvotes: 0

Views: 336

Answers (1)

jezrael
jezrael

Reputation: 862601

It seems you need:

df1 = df.set_index(['Date', 'Team'])
df2 = df.set_index(['Date', 'Opponent'])

df3 = pd.concat([df1, df2], axis=1)

Upvotes: 1

Related Questions