user2896120
user2896120

Reputation: 3282

Grouping column values together

I have a dataframe like so:

Class  price  demand
1       22       8
1       60       7
3       32       14
2       72       9
4       45       20
5       42       25

What I'd like to do is group classes 1-3 in one category and classes 4-5 in one category. Then I'd like to get the sum of price for each category and the sum of demand for each category. I'd like to also get the mean. The result should look something like this:

Class   TotalPrice   TotalDemand   AveragePrice  AverageDemand
P          186            38           46.5          9.5   
E          87             45           43.5          22.5

Where P is classes 1-3 and E is classes 4-5. How can I group by categories in pandas? Is there a way to do this?

Upvotes: 3

Views: 101

Answers (3)

piRSquared
piRSquared

Reputation: 294506

You can create a dictionary that defines your groups.

mapping = {**dict.fromkeys([1, 2, 3], 'P'), **dict.fromkeys([4, 5], 'E')}

Then if you pass a dictionary or callable to a groupby it automatically gets mapped onto the index. So, let's set the index to Class

d = df.set_index('Class').groupby(mapping).agg(['sum', 'mean']).sort_index(1, 1)

Finally, we do some tweaking to get column names the way you specified.

rename_dict = {'sum': 'Total', 'mean': 'Average'}
d.columns = d.columns.map(lambda c: f"{rename_dict[c[1]]}{c[0].title()}")

d.rename_axis('Class').reset_index()

  Class  TotalPrice  TotalDemand  AveragePrice  AverageDemand
0     E          87           45          43.5           22.5
1     P         186           38          46.5            9.5

Upvotes: 4

ALollz
ALollz

Reputation: 59579

In general, you can form arbitrary bins to group your data using pd.cut, specifying the right bin edges:

import pandas as pd

pd.cut(df.Class, bins=[0, 3, 5], labels=['P', 'E'])
#0    P
#1    P
#2    P
#3    P
#4    E
#5    E

df2 = (df.groupby(pd.cut(df.Class, bins=[0,3,5], labels=['P', 'E']))[['demand', 'price']]
         .agg({'sum', 'mean'}).reset_index())

# Get rid of the multi-level columns
df2.columns = [f'{i}_{j}' if j != '' else f'{i}' for i,j in df2.columns]

Output:

  Class  demand_sum  demand_mean  price_sum  price_mean
0     P          38          9.5        186        46.5
1     E          45         22.5         87        43.5

Upvotes: 2

chrisaycock
chrisaycock

Reputation: 37928

In [8]: df.groupby(np.where(df['Class'].isin([1, 2, 3]), 'P', 'E'))[['price', 'demand']].agg(['sum', 'mean'])
Out[8]: 
  price       demand      
    sum  mean    sum  mean
E    87  43.5     45  22.5
P   186  46.5     38   9.5

Upvotes: 4

Related Questions