Reputation: 2520
I want to know whether the following calculation could be simplified or alternative calculations could be used:
id | group | value
01 | 1 | 5
01 | 1 | 6
02 | 1 | 35
02 | 1 | 40
03 | 1 | 90
03 | 1 | 95
control | 1 | 50
control | 1 | 60
04 | 2 | 35
04 | 2 | 36
05 | 2 | 15
05 | 2 | 10
06 | 2 | 20
06 | 2 | 25
control | 2 | 30
control | 2 | 40
First an average of all controls per group is calculated:
id | group | value_mean
control | 1 | 55
control | 2 | 35
Then the mean difference relative to the control for each id is calculated:
id | group | value_mean_percent
01 | 1 | 10
02 | 1 | 68
03 | 1 | 168
04 | 2 | 101
05 | 2 | 36
06 | 2 | 64
Example for id=1 (5+6)/2*100/55 = 10 %
Is there another way to interpret the results of such data?
Upvotes: 1
Views: 92
Reputation: 46888
I think you can define a function that works on a subgroup of the data frame, where it groups value
by id
, calculates mean and returns everything divided by control:
def func(da):
da = da.groupby('id')['value'].mean()
return 100*da[da.index != "control"]/da['control']
Then you just group the data.frame by group
, apply this function, and reset index:
import pandas as pd
import numpy as np
df = pd.DataFrame({"id":np.repeat(['01','02','03','control','04','05','06','control'],2),
"group":np.repeat([1,2],8),
"value":[5,6,35,40,90,95,50,60,35,36,15,10,20,25,30,40]})
df.groupby('group').apply(func).reset_index()
group id value
0 1 01 10.000000
1 1 02 68.181818
2 1 03 168.181818
3 2 04 101.428571
4 2 05 35.714286
5 2 06 64.285714
Upvotes: 1