Reputation: 5117
I am doing the following:
def percentage(x):
return x[(x<=5)].count() / x.count() * 100
full_data = full_data.groupby(['Id', 'Week_id'], as_index=False).agg({'Volume': percentage})
But I want to do this groupby
successively with multiple values such x<=7
, x<=9
, x<=11
etc at the percentage
function.
What is the easiest way to do this instead of writing multiple functions and calling them?
So basically I want to avoid doing something like this:
def percentage_1(x):
return x[(x<=5)].count() / x.count() * 100
full_data_1 = full_data.groupby(['Id', 'Week_id'], as_index=False).agg({'Volume': percentage_1})
def percentage_2(x):
return x[(x<=7)].count() / x.count() * 100
full_data_2 = full_data.groupby(['Id', 'Week_id'], as_index=False).agg({'Volume': percentage_2})
# etc.
Upvotes: 2
Views: 100
Reputation: 5117
I came up with this as the most concise solution to my question:
def percentage(x):
global c
return x[(x<=c)].count() / x.count() * 100
c=5
full_data_5 = full_data.groupby(['Id', 'Week_id'], as_index=False).agg({'Volume': percentage})
c=7
full_data_7 = full_data.groupby(['Id', 'Week_id'], as_index=False).agg({'Volume': percentage})
c=9
full_data_9 = full_data.groupby(['Id', 'Week_id'], as_index=False).agg({'Volume': percentage})
# etc
However, I am using a global variable and this is a quite controversial practice.
Upvotes: 0
Reputation: 863301
You can rewrite your function - create new column filled by boolean mask and then aggregate mean
and last multiple by 100
with Series.mul
:
n = 3
full_data['new'] = full_data['Volume'] <= n
full_data = full_data.groupby(['Id', 'Week_id'])['new'].mean().mul(100).reset_index()
Solution with function:
def per(df, n):
df['new'] = df['Volume'] <= n
return df.groupby(['Id', 'Week_id'])['new'].mean().mul(100).reset_index()
EDIT: Solution from github:
full_data = pd.DataFrame({
'Id':list('XXYYZZXYZX'),
'Volume':[2,4,8,1,2,5,8,2,6,4],
'Week_id':list('aaabbbabac')
})
print (full_data)
val = 5
def per(c):
def f1(x):
return x[(x<=c)].count() / x.count() * 100
return f1
full_data2 = full_data.groupby(['Id', 'Week_id']).agg({'Volume': per(val)}).reset_index()
print (full_data2)
Id Week_id Volume
0 X a 66.666667
1 X c 100.000000
2 Y a 0.000000
3 Y b 100.000000
4 Z a 0.000000
5 Z b 100.000000
def percentage(x):
return x[(x<=val)].count() / x.count() * 100
full_data1 = full_data.groupby(['Id', 'Week_id'], as_index=False).agg({'Volume': percentage})
print (full_data1)
Id Week_id Volume
0 X a 66.666667
1 X c 100.000000
2 Y a 0.000000
3 Y b 100.000000
4 Z a 0.000000
5 Z b 100.000000
Upvotes: 2