Jason Bourne
Jason Bourne

Reputation: 35

Pandas groupby the same column multiple times based on other category column

I have a dataframe that I am trying to use pandas.groupby on to get the sum. The values that I am grouping are as follows:

    order_id    otype   score
0   id1 1   1.23
1   id1 2   1.56
2   id1 3   1.65
3   id1 4   1.73
4   id1 5   0.89
5   id2 1   1.07
6   id2 2   1.38
7   id2 3   0.94
8   id2 4   0.72
9   id2 5   1.37
10  id3 1   1.04
11  id3 2   0.56
12  id3 3   0.78
13  id3 4   1.12
14  id3 5   0.84

and I would like to perform a .groupby operation on order_id based on category values of otype, I want to create groupings based on otype as follows:

otype in [1,2,3] --> class1
otype in [4,5] --> class2
otype in [1,2,3,4,5] --> class3
otype in [2,3,5] --> class4

I would like the (very) end result to be a table looking like this:

   order_id    otype   score_sum
0   id1 class1   4.4
1   id1 class2   2.62
2   id1 class3   7.06
3   id1 class4   4.1

One way I can think of doing this is to calculate the result for each class separately and then use pd.concat.But I don't think this is a computationally efficient way, nor is it a code-simple way.

Is there a better way?

Upvotes: 1

Views: 52

Answers (1)

jezrael
jezrael

Reputation: 863176

I think the best is test both solutions in real data - but in my opinion pd.concat solution is simplier and if few keys in dictionary should be faster:

d= {'class1': [1,2,3], 
    'class2': [4,5],
    'class3': [1,2,3,4,5],
    'class4': [2,3,5]}

df1 = pd.concat({k: df.loc[df['otype'].isin(v)]
                      .groupby('order_id').score.sum() for k, v in d.items()})
print (df1)
        order_id
class1  id1         4.44
        id2         3.39
        id3         2.38
class2  id1         2.62
        id2         2.09
        id3         1.96
class3  id1         7.06
        id2         5.48
        id3         4.34
class4  id1         4.10
        id2         3.69
        id3         2.18
Name: score, dtype: float64

Alternative solution:

df1 = (pd.concat({k:df.loc[df['otype'].isin(v),['order_id','score']].assign(**{'class':k}) 
                     for k, v in d.items()})
         .groupby(['class', 'order_id'])
         .score.sum())
print (df1)
class   order_id
class1  id1         4.44
        id2         3.39
        id3         2.38
class2  id1         2.62
        id2         2.09
        id3         1.96
class3  id1         7.06
        id2         5.48
        id3         4.34
class4  id1         4.10
        id2         3.69
        id3         2.18
Name: score, dtype: float64

for k, v in d.items():
    df.loc[df['otype'].isin(v), k] = k
    
df = (df.drop('otype',1)
        .melt(['order_id','score'])
        .groupby(['value','order_id'])['score'].sum())
print (df)
value   order_id
class1  id1         4.44
        id2         3.39
        id3         2.38
class2  id1         2.62
        id2         2.09
        id3         1.96
class3  id1         7.06
        id2         5.48
        id3         4.34
class4  id1         4.10
        id2         3.69
        id3         2.18
Name: score, dtype: float64

Upvotes: 1

Related Questions