adrian
adrian

Reputation: 2376

Pandas: Aggregate differently based on group

So let's say I have some data as follow:

patient_id  lab_type  value
1           food       10
1           food       8
2           food       3
2           food       5
1           shot       4
1           shot       10
2           shot       2
2           shot       4

Then I will group things such as groupby(['patient_id', 'lab_type'])

After that, I'd like to aggregate on value but different for each lab_type. On food I'd like to aggregate using mean and on shot I'd like to aggregate using sum.

The final data should look like this:

  patient_id  lab_type  value
  1           food       9 (10 + 8 / 2)
  2           food       4 (3 + 5 / 2)
  1           shot       14 (10 + 4)
  2           shot       6 (2 + 4)

Upvotes: 2

Views: 417

Answers (4)

jezrael
jezrael

Reputation: 863156

I try modified john answer:

You can use mean and sum and then concat with reset_index:

print df
   patient_id lab_type  value
0           1     food     10
1           1     food      8
2           2     food      3
3           2     food      5
4           1     shot      4
5           1     shot     10
6           2     shot      2
7           2     shot      4


df1 = df[df.lab_type =="food"].groupby(['patient_id']).mean()
df1['lab_type'] = 'food'
print df1
            value lab_type
patient_id                
1               9     food
2               4     food

df2 = df[df.lab_type =="shot"].groupby(['patient_id']).sum()
df2['lab_type'] = 'shot'
print df2
            value lab_type
patient_id                
1              14     shot
2               6     shot

print pd.concat([df1, df2]).reset_index()
   patient_id  value lab_type
0           1      9     food
1           2      4     food
2           1     14     shot
3           2      6     shot

Upvotes: 1

miraculixx
miraculixx

Reputation: 10359

On food I'd like to aggregate using mean and on shot I'd like to aggregate using sum.

Just use .apply and pass a custom function:

def calc(g):
    if g.iloc[0].lab_type == 'shot':
        return sum(g.value)
    else:
        return np.mean(g.value)
result = df.groupby(['patient_id', 'lab_type']).apply(calc)

Here calc receives the per-group dataframe as shown in Panda's split-apply-combine. As a result you get what you want:

patient_id  lab_type
1           food         9
            shot        14
2           food         4
            shot         6
dtype: float64

Upvotes: 1

Markus Weninger
Markus Weninger

Reputation: 12668

The answer in this post looks promising. Starting from this I came up with the following code which should work out for you.

Testdata:

data = [{"A" : 1, "B" : "food", "C" : 10},
{"A" : 1, "B" : "food", "C" : 8},
{"A" : 2, "B" : "food", "C" : 3},
{"A" : 2, "B" : "food", "C" : 5},
{"A" : 1, "B" : "shot", "C" : 4},
{"A" : 1, "B" : "shot", "C" : 10},
{"A" : 2, "B" : "shot", "C" : 2},
{"A" : 2, "B" : "shot", "C" : 4}]    
df = pd.DataFrame(data)

Actual code:

res = df.groupby(['A', 'B']).apply(
  lambda x: pd.Series(
    {"value" : x.C.mean() if x.iloc[0].B == "food" else x.C.sum()}
  )
)

This results in

        value
A B          
1 food      9
  shot     14
2 food      4
  shot      6

Upvotes: 0

john mangual
john mangual

Reputation: 8172

Let P be your DataFrame.

P[P.lab_type =="food"].groupby(['patient_id']).aggregate(np.avg)

and similarly for the shot group and concatenate the results.

Upvotes: 0

Related Questions