Reputation: 97
Compare the values in two columns and extract the values of a third column in a dataframe
df =
Location | teams | goals |
---|---|---|
1 | A | 5 |
1 | B | 6 |
2 | A | 7 |
2 | B | 5 |
2 | C | 6 |
3 | B | 7 |
Expected output df =
Location | A Team | A Team goals | B Team | B Team Goals | C Team | C Team goals |
---|---|---|---|---|---|---|
1 | 1 | 5 | 1 | 6 | 0 | NA |
2 | 1 | 7 | 1 | 5 | 1 | 6 |
3 | 0 | NA | 1 | 7 | 0 | NA |
Upvotes: 2
Views: 100
Reputation: 71689
Reshape the dataframe using pivot to get the goals. Check for the non null values in goals
to identify the teams
then join
to get the result
goals = df.pivot(*df.columns)
teams = s.notna().astype(int)
teams.add_suffix(' Team').join(goals.add_suffix(' Team Goals'))
Result
teams A Team B Team C Team A Team Goals B Team Goals C Team Goals
Location
1 1 1 0 5.0 6.0 NaN
2 1 1 1 7.0 5.0 6.0
3 0 1 0 NaN 7.0 NaN
Upvotes: 1
Reputation: 13212
Example Code
data = {'Location': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5: 3},
'teams': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'C', 5: 'B'},
'goals': {0: 5, 1: 6, 2: 7, 3: 5, 4: 6, 5: 7}}
df = pd.DataFrame(data)
First
aggregation with groupby
(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
.unstack().swaplevel(0, 1, axis=1).sort_index(axis=1))
output:
teams A B C
count sum count sum count sum
Location
1 1.0 5.0 1.0 6.0 NaN NaN
2 1.0 7.0 1.0 5.0 1.0 6.0
3 NaN NaN 1.0 7.0 NaN NaN
Second
Let's create idx
to change columns
idx = pd.MultiIndex.from_product([df['teams'].unique(), ['Team', 'Team Goal']]).map(lambda x: ' '.join(x))
idx
Index(['A Team', 'A Team Goal', 'B Team', 'B Team Goal', 'C Team', 'C Team Goal'], dtype='object')
Last
change columns and reset_index
(include First code)
(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
.unstack().swaplevel(0, 1, axis=1).sort_index(axis=1)
.set_axis(idx, axis=1).reset_index())
output
Location A Team A Team Goal B Team B Team Goal C Team C Team Goal
0 1 1.0 5.0 1.0 6.0 NaN NaN
1 2 1.0 7.0 1.0 5.0 1.0 6.0
2 3 NaN NaN 1.0 7.0 NaN NaN
Upvotes: 1