Reputation: 67
I have a dataframe that looks like:
In [3]df
Out[3]:
customer monthly_revenue
0 a 2
1 a 4
2 a 1
3 b 3
4 b 3
5 b 3
6 b 2
7 b 5
8 c 10
9 c 5
For each customer, I want to loop through their monthly revenue numbers and calculate how many data points are over or under a certain threshold. What is the best way to do the iteration here? The outcome I want is:
customer rev_over_2 rev_over_5
0 a 0.33 0.0
1 b 0.80 0.2
2 c 1.00 1.0
The second column means the percentage of data points are over that are 2 and the third column means the percentage of data points that are over 5.
Thank you!
Upvotes: 0
Views: 126
Reputation: 323226
Using Series
groupby
+ transform
sum
thresh=2
(df['monthly_revenue']>thresh).groupby(df.customer).transform('sum')
Out[175]:
0 1.0
1 1.0
2 1.0
3 4.0
4 4.0
5 4.0
6 4.0
7 4.0
8 2.0
9 2.0
Name: monthly_revenue, dtype: float64
Update
pd.crosstab(df.customer,(df['monthly_revenue']>thresh),normalize ='index')[True]
Out[191]:
customer
a 0.333333
b 0.800000
c 1.000000
Name: True, dtype: float64
Upvotes: 1