Python Pandas: how to iterate through rows with a common column value

Question

I have a dataframe that looks like:

In [3]df
Out[3]: 
   customer  monthly_revenue
0        a                2
1        a                4
2        a                1
3        b                3
4        b                3
5        b                3
6        b                2
7        b                5
8        c               10
9        c                5

For each customer, I want to loop through their monthly revenue numbers and calculate how many data points are over or under a certain threshold. What is the best way to do the iteration here? The outcome I want is:

      customer  rev_over_2  rev_over_5
0        a        0.33         0.0
1        b        0.80         0.2
2        c        1.00         1.0

The second column means the percentage of data points are over that are 2 and the third column means the percentage of data points that are over 5.

Thank you!

BENY · Accepted Answer

Using Series groupby + transform sum

thresh=2
(df['monthly_revenue']>thresh).groupby(df.customer).transform('sum')
Out[175]: 
0    1.0
1    1.0
2    1.0
3    4.0
4    4.0
5    4.0
6    4.0
7    4.0
8    2.0
9    2.0
Name: monthly_revenue, dtype: float64

Update

pd.crosstab(df.customer,(df['monthly_revenue']>thresh),normalize ='index')[True]
Out[191]: 
customer
a    0.333333
b    0.800000
c    1.000000
Name: True, dtype: float64

Python Pandas: how to iterate through rows with a common column value

Answers (1)

Related Questions