Count how many rows within the same group have a larger value in a given column for each row in Pandas DataFrame

Question

I have a pandas dataframe with a group field and variable of interest. For each row in the dataframe I want to count how many rows within the same group have a larger value for the variable of interest.

Below is an example of what I'm trying to achieve:

import pandas as pd
df = pd.DataFrame(data = [['a',1],['a',2],['a',2],['a',3],['b',4],['b',2],['b',6]],
                  columns = ['groups','value'])
df

  groups value
0   a      1
1   a      2
2   a      2
3   a      3
4   b      4
5   b      2
6   b      6

Here is the output I'm hoping to receive:

  groups value what_i_want
0   a      1        3
1   a      2        1
2   a      2        1
3   a      3        0
4   b      4        1
5   b      2        2
6   b      6        0

I know I could get to this answer by looping through each row of the dataframe, however I also know iterating through the rows of dataframe is a last resort and my full dataset is much bigger and that would take a long time to run. I'm assuming there is some way to do this using groupby or apply, but I can't figure it out.

Thanks!

BENY · Accepted Answer

IIUC rank

(-df.value).groupby(df['groups']).rank(method='min')-1
Out[466]: 
0    3.0
1    1.0
2    1.0
3    0.0
4    1.0
5    2.0
6    0.0
Name: value, dtype: float64

#df['what i want']=(-df.value).groupby(df['groups']).rank(method='min')-1

Count how many rows within the same group have a larger value in a given column for each row in Pandas DataFrame

Answers (2)

Explanation

Related Questions