Reputation: 23
I'm working on an organisational matrix, and am trying to reproduce a task in Python that is very inefficiently done using MS Access usually. There probably is a simple answer to this issue but i've not been able to find it anywhere, apologies if this has already been answered.
Let's posit a matrix in a dict and put it in a DF:
matrix ={'Flow':['Flow1','Flow2','Flow3','Flow4','Flow6']*6,
'User':['Jill','Jacky','Joanie','Peter','Paul','Paddy']*5,
'Role':['Requestor','Manager','Approver']*10}
mydf=pd.DataFrame(matrix)
This gives me a table of flows for which there are three roles, each of which can be fulfilled by several possible individuals (only two here, many more in reality).
For each flow/role pair, i would like to get a ranking by name, so that my data looks like this:
Flow User Role Rank
0 Flow1 Jill Requestor 1
5 Flow1 Paddy Approver 1
10 Flow1 Paul Manager 1
15 Flow1 Peter Requestor 2
20 Flow1 Joanie Approver 2
25 Flow1 Jacky Manager 2
and so on for each flows. This can be a first come first serve basis, alphabetical order does not matter really.
Supposedly, this should be done by using groupby.rank()
mydf['Rank']=mydf.groupby(['Flow','Role'])['User'].rank(method='dense')
but this returns a 'NoneType' object is not callable error. My end goal is to pivot the matrix to have each role represented in a column, but i really need this ranking to be able to represent the data for all users.
Please tell me what i'm doing wrong, perhaps i need to use .apply() instead, not sure how though.
Upvotes: 2
Views: 2918
Reputation: 71687
You can use .groupby
, along with .cumcount
:
mydf['Rank'] = mydf.groupby(['Flow','Role'])['User'].cumcount().add(1)
Result:
# print(mydf[mydf['Flow'].eq('Flow1')])
Flow User Role Rank
0 Flow1 Jill Requestor 1
5 Flow1 Paddy Approver 1
10 Flow1 Paul Manager 1
15 Flow1 Peter Requestor 2
20 Flow1 Joanie Approver 2
25 Flow1 Jacky Manager 2
Upvotes: 4