Using Pandas Rank on strings in DataFrameGroupBy

Question

I'm working on an organisational matrix, and am trying to reproduce a task in Python that is very inefficiently done using MS Access usually. There probably is a simple answer to this issue but i've not been able to find it anywhere, apologies if this has already been answered.

Let's posit a matrix in a dict and put it in a DF:

matrix ={'Flow':['Flow1','Flow2','Flow3','Flow4','Flow6']*6,
     'User':['Jill','Jacky','Joanie','Peter','Paul','Paddy']*5,
     'Role':['Requestor','Manager','Approver']*10}
mydf=pd.DataFrame(matrix)

This gives me a table of flows for which there are three roles, each of which can be fulfilled by several possible individuals (only two here, many more in reality).

For each flow/role pair, i would like to get a ranking by name, so that my data looks like this:

   Flow    User       Role      Rank
0   Flow1  Jill       Requestor 1
5   Flow1  Paddy      Approver  1
10  Flow1  Paul       Manager   1
15  Flow1  Peter      Requestor 2
20  Flow1  Joanie     Approver  2
25  Flow1  Jacky      Manager   2

and so on for each flows. This can be a first come first serve basis, alphabetical order does not matter really.

Supposedly, this should be done by using groupby.rank()

mydf['Rank']=mydf.groupby(['Flow','Role'])['User'].rank(method='dense')

but this returns a 'NoneType' object is not callable error. My end goal is to pivot the matrix to have each role represented in a column, but i really need this ranking to be able to represent the data for all users.

Please tell me what i'm doing wrong, perhaps i need to use .apply() instead, not sure how though.

Shubham Sharma · Accepted Answer

You can use .groupby, along with .cumcount:

mydf['Rank'] = mydf.groupby(['Flow','Role'])['User'].cumcount().add(1)

Result:

# print(mydf[mydf['Flow'].eq('Flow1')])

    Flow    User       Role  Rank
0   Flow1    Jill  Requestor     1
5   Flow1   Paddy   Approver     1
10  Flow1    Paul    Manager     1
15  Flow1   Peter  Requestor     2
20  Flow1  Joanie   Approver     2
25  Flow1   Jacky    Manager     2

Using Pandas Rank on strings in DataFrameGroupBy

Answers (1)

Related Questions