Using `rank` on a pandas DataFrameGroupBy object

Question

I have some simple data in a dataframe consisting of three columns [id, country, volume] where the index is 'id'.

I can perform simple operations like:

df_vol.groupby('country').sum()

and it works as expected. When I attempt to use rank() it does not work as expected and the results is an empty dataframe.

df_vol.groupby('country').rank()

The result is not consistent and in some cases it works. The following also works as expected:

df_vol.rank()

I want to return something like:

vols = []
for _, df in f_vol.groupby('country'):
    vols.append(df['volume'].rank())
pd.concat(vols)

Any ideas why much appreciated!

jezrael · Accepted Answer

You can add column by [] - function is call only for column Volume:

df_vol.groupby('country')['volume'].rank()

Sample:

df_vol = pd.DataFrame({'country':['en','us','us','en','en'],
                   'volume':[10,10,30,20,50],
                   'id':[1,1,1,2,2]})
print(df_vol)
  country  id  volume
0      en   1      10
1      us   1      10
2      us   1      30
3      en   2      20
4      en   2      50

df_vol['r'] = df_vol.groupby('country')['volume'].rank()
print (df_vol)
  country  id  volume    r
0      en   1      10  1.0
1      us   1      10  1.0
2      us   1      30  2.0
3      en   2      20  2.0
4      en   2      50  3.0

Using `rank` on a pandas DataFrameGroupBy object

Answers (1)

Related Questions