Reputation: 1280
What's an efficient way to extract second index after groupby
against the maximum customer count per group.
Assume a dataframe df
with various states and 10 officers (names Officer 1
to Officer 10
) in every state. The column Current Status
will always have value Customer
:
State List Sales Officer Current Status
0 UP Officer 4 Customer
1 MH Officer 5 Customer
2 AP Officer 6 Customer
3 AN Officer 2 Customer
4 GJ Officer 3 Customer
.... so on
The expected output consists of sales officer with highest customer count in each state:
State List Sales Officer
AN Officer 6 403
AP Officer 1 266
Officer 8 266
... and so on
So far I've performed the following:
df.groupby(['State List', 'Sales Officer'])['Current Status'].count()#.reset_index()
giving me the following:
State List Sales Officer
AN Officer 1 376
Officer 10 401
Officer 2 353
Officer 3 373
Officer 4 375
Officer 5 382
Officer 6 403
Officer 7 400
Officer 8 385
Officer 9 378
AP Officer 1 266
Officer 10 228
Officer 2 240
Officer 3 248
Officer 4 235
Officer 5 229
Officer 6 242
Officer 7 238
Officer 8 266
Officer 9 243
Now, I'm stuck in fetching out the Sales Officer
against each State List
with maximum customer count. Any ideas!
Upvotes: 4
Views: 44
Reputation: 862641
Use boolean indexing
with transform
of max
for return Series
with same size like original:
s = df.groupby(['State List', 'Sales Officer'])['Current Status'].count()
df = s[s == s.groupby('State List').transform('max')]
print (df)
State List Sales Officer
AN Officer 6 403
AP Officer 1 266
Officer 8 266
Name: a, dtype: int64
Detail:
print (s.groupby('State List').transform('max'))
State List Sales Officer
AN Officer 1 403
Officer 10 403
Officer 2 403
Officer 3 403
Officer 4 403
Officer 5 403
Officer 6 403
Officer 7 403
Officer 8 403
Officer 9 403
AP Officer 1 266
Officer 10 266
Officer 2 266
Officer 3 266
Officer 4 266
Officer 5 266
Officer 6 266
Officer 7 266
Officer 8 266
Officer 9 266
Name: a, dtype: int64
Upvotes: 4