Reputation: 989
How can I use a dataframe to do some groupby on the columns of another dataframe ?
I have the following dataframes :
|col1|col2|col3|col4|col5|col6|col7
--------------------------------------
0 |1 |10 |10 |100 |100 |10 |100
1 |1 |10 |10 |100 |100 |10 |100
and
|name |group
---------------
0 |col1 |group1
1 |col2 |group2
2 |col3 |group2
3 |col4 |group3
4 |col5 |group3
5 |col6 |group2
6 |col7 |group3
I want to use the mapping on the second dataframe to do a groupby and sum on the first dataframe. I would like to obtain
|group1|group2|group3
--------------------------
0 |1 |30 |300
1 |1 |30 |300
col2, col3 and col6 are summed together because they are mapped to group2. col4, col5 and col7 are summed together because they are mapped to group3. col1 is mapped to group1
To recreate the dataframes :
# initial dataframe
df = pd.DataFrame(
{
"col1": [1, 1],
"col2": [10, 10],
"col3": [10, 10],
"col4": [100, 100],
"col5": [100, 100],
"col6": [10, 10],
"col7": [100, 100],
}
)
# dataframe containing all the mappings
mapping = pd.DataFrame(
{
"name": ["col1", "col2", "col3", "col4", "col5", "col6", "col7"],
"group": ["group1", "group2", "group2", "group3", "group3", "group2", "group3"],
}
)
# dataframe I want to obtain
final = pd.DataFrame(
{
"group1": [1, 1],
"group2": [30, 30],
"group3": [300, 300],
}
)
Upvotes: 1
Views: 593
Reputation: 863301
Use Index.map
by columns names and then aggregate sum
:
s = mapping.set_index('name')['group']
final = df.groupby(df.columns.map(s), axis=1).sum()
print (final)
group1 group2 group3
0 1 30 300
1 1 30 300
Or use rename
first and then aggregate:
s = mapping.set_index('name')['group']
final = df.rename(columns=s).groupby(level=0, axis=1).sum()
print (final)
group1 group2 group3
0 1 30 300
1 1 30 300
Output is different if values not exist in mappings
:
# dataframe containing all the mappings (removed last 2 values)
mapping = pd.DataFrame(
{
"name": ["col1", "col2", "col3", "col4", "col5"],
"group": ["group1", "group2", "group2", "group3", "group3"],
}
)
s = mapping.set_index('name')['group']
#only matched values are processing
final = df.groupby(df.columns.map(s), axis=1).sum()
print (final)
group1 group2 group3
0 1 20 200
1 1 20 200
#not matched columns are added to ouput and not changed
final = df.rename(columns=s).groupby(level=0, axis=1).sum()
print (final)
col6 col7 group1 group2 group3
0 10 100 1 20 200
1 10 100 1 20 200
Upvotes: 3