Reputation: 1644
I have a the following dataframe:
df = pd.DataFrame({
'cluster': ['A','B','C','A','B','C','D','D'],
'profit': [-1.0,1.5,1,0.5,3.0,-2,-1, -2]
})
I am performing a series of groupby operations before outputting into another datafram, most of which I have got to work.
df['cluster_total_profit'] = df.groupby(['cluster'])['profit'].transform('sum')
df['cluster_mean_profit'] = df.groupby(['cluster'])['profit'].transform('mean')
df['occurances'] = df.groupby(['cluster'])['profit'].transform('count')
df['std'] = df.groupby(['cluster'])['profit'].transform('std')
clusters = df[['cluster','cluster_total_profit', 'cluster_mean_profit', 'occurances', 'std']].drop_duplicates().reset_index(drop=True)
The output is as follows:
cluster cluster_total_profit cluster_mean_profit occurances std
0 A -0.5 -0.25 2 1.06066
1 B 4.5 2.25 2 1.06066
2 C -1.0 -0.50 2 2.12132
3 D -3.0 -1.50 2 0.707107
The last transformation I am trying to get to work is to count the number of profitable events in each group and populate the df with the number of such events. The the output could be collected in the table above which would look like the following:
cluster cluster_total_profit cluster_mean_profit occurances std profitable_events
0 A -0.5 -0.25 2 1.06066 1
1 B 4.5 2.25 2 1.06066 2
2 C -1.0 -0.50 2 2.12132 1
3 D -3.0 -1.50 2 0.707107 0
I have looked here and here but I can't quite get the examples to tranlate to my exact usecase. Here is my code:
df['profitable_events'] = df.cluster.map(df.groupby(['cluster']).filter(lambda x: x[x['profit'] > 0.0].count()))
clusters = df[['cluster','cluster_total_profit', 'cluster_mean_profit', 'occurances', 'std', 'profitable_events']].drop_duplicates().reset_index(drop=True)
and:
df['profitable_events'] = df.groupby(['cluster']).filter(lambda x: x[x['profit'] > 0.0]).transform('count')
Both throw an error "TypeError: filter function returned a Series, but expected a scalar bool"
I also tried:
df['profitable_events'] = df.cluster.map(df.groupby(['cluster']).filter(lambda x: len(x[x['profit'] > 0.0].index)))
Which errored: "TypeError: filter function returned a int, but expected a scalar bool"
I'm sure there is a quic fix but am not sure what it is?
Many thanks in advance
Upvotes: 1
Views: 232
Reputation: 150735
You can use a custom function to count the profitable event:
df.groupby('cluster')['profit'].agg([
'sum','mean','count','std',
('profitable_event', lambda x: x.gt(0).sum())
])
Output
sum mean count std profitable_event
cluster
A -0.5 -0.25 2 1.060660 1.0
B 4.5 2.25 2 1.060660 2.0
C -1.0 -0.50 2 2.121320 1.0
D -3.0 -1.50 2 0.707107 0.0
Upvotes: 4