Gabriel
Gabriel

Reputation: 989

pandas : Groupby and sum on specific column based on a mapping in another dataframe

How can I use a dataframe to do some groupby on the columns of another dataframe ?

I have the following dataframes :

   |col1|col2|col3|col4|col5|col6|col7
--------------------------------------
0  |1   |10  |10  |100 |100 |10  |100
1  |1   |10  |10  |100 |100 |10  |100

and

   |name  |group
  ---------------
0  |col1  |group1
1  |col2  |group2
2  |col3  |group2
3  |col4  |group3
4  |col5  |group3
5  |col6  |group2
6  |col7  |group3

I want to use the mapping on the second dataframe to do a groupby and sum on the first dataframe. I would like to obtain

   |group1|group2|group3
--------------------------
0  |1     |30    |300
1  |1     |30    |300

col2, col3 and col6 are summed together because they are mapped to group2. col4, col5 and col7 are summed together because they are mapped to group3. col1 is mapped to group1

To recreate the dataframes :

# initial dataframe
df = pd.DataFrame(
    {
        "col1": [1, 1],
        "col2": [10, 10],
        "col3": [10, 10],
        "col4": [100, 100],
        "col5": [100, 100],
        "col6": [10, 10],
        "col7": [100, 100],
    }
)
# dataframe containing all the mappings
mapping = pd.DataFrame(
    {
        "name": ["col1", "col2", "col3", "col4", "col5", "col6", "col7"],
        "group": ["group1", "group2", "group2", "group3", "group3", "group2", "group3"],
    }
)
# dataframe I want to obtain
final = pd.DataFrame(
    {
        "group1": [1, 1],
        "group2": [30, 30],
        "group3": [300, 300],
    }
)

Upvotes: 1

Views: 593

Answers (1)

jezrael
jezrael

Reputation: 863301

Use Index.map by columns names and then aggregate sum:

s = mapping.set_index('name')['group']

final = df.groupby(df.columns.map(s), axis=1).sum()
print (final)
   group1  group2  group3
0       1      30     300
1       1      30     300

Or use rename first and then aggregate:

s = mapping.set_index('name')['group']

final = df.rename(columns=s).groupby(level=0, axis=1).sum()
print (final)
   group1  group2  group3
0       1      30     300
1       1      30     300

Output is different if values not exist in mappings:

# dataframe containing all the mappings (removed last 2 values)
mapping = pd.DataFrame(
    {
        "name": ["col1", "col2", "col3", "col4", "col5"],
        "group": ["group1", "group2", "group2", "group3", "group3"],
    }
)

s = mapping.set_index('name')['group']

#only matched values are processing
final = df.groupby(df.columns.map(s), axis=1).sum()
print (final)
   group1  group2  group3
0       1      20     200
1       1      20     200

#not matched columns are added to ouput and not changed
final = df.rename(columns=s).groupby(level=0, axis=1).sum()
print (final)
   col6  col7  group1  group2  group3
0    10   100       1      20     200
1    10   100       1      20     200

Upvotes: 3

Related Questions