Compare the values in two columns and extract the values of a third column in a dataframe

Compare the values in two columns and extract the values of a third column in a dataframe

df =

Location teams goals
1 A 5
1 B 6
2 A 7
2 B 5
2 C 6
3 B 7

Expected output df =

Location A Team A Team goals B Team B Team Goals C Team C Team goals
1 1 5 1 6 0 NA
2 1 7 1 5 1 6
3 0 NA 1 7 0 NA

Upvotes: 2

Views: 100

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Reshape the dataframe using pivot to get the goals. Check for the non null values in goals to identify the teams then join to get the result

goals = df.pivot(*df.columns)
teams = s.notna().astype(int)

teams.add_suffix(' Team').join(goals.add_suffix(' Team Goals'))

Result

teams     A Team  B Team  C Team  A Team Goals  B Team Goals  C Team Goals
Location                                                                  
1              1       1       0           5.0           6.0           NaN
2              1       1       1           7.0           5.0           6.0
3              0       1       0           NaN           7.0           NaN

Upvotes: 1

Panda Kim
Panda Kim

Reputation: 13212

Example Code

data = {'Location': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5: 3},
        'teams': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'C', 5: 'B'},
        'goals': {0: 5, 1: 6, 2: 7, 3: 5, 4: 6, 5: 7}}
df = pd.DataFrame(data)

First

aggregation with groupby

(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
 .unstack().swaplevel(0, 1, axis=1).sort_index(axis=1))

output:

teams   A               B               C
        count   sum     count   sum     count   sum
Location                        
1       1.0     5.0     1.0     6.0     NaN     NaN
2       1.0     7.0     1.0     5.0     1.0     6.0
3       NaN     NaN     1.0     7.0     NaN     NaN



Second

Let's create idx to change columns

idx = pd.MultiIndex.from_product([df['teams'].unique(), ['Team', 'Team Goal']]).map(lambda x: ' '.join(x))

idx

Index(['A Team', 'A Team Goal', 'B Team', 'B Team Goal', 'C Team', 'C Team Goal'], dtype='object')



Last

change columns and reset_index (include First code)

(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
 .unstack().swaplevel(0, 1, axis=1).sort_index(axis=1)
 .set_axis(idx, axis=1).reset_index())

output

    Location    A Team  A Team Goal B Team  B Team Goal C Team  C Team Goal
0   1           1.0     5.0         1.0     6.0         NaN     NaN
1   2           1.0     7.0         1.0     5.0         1.0     6.0
2   3           NaN     NaN         1.0     7.0         NaN     NaN

Upvotes: 1

Related Questions