Zanam
Zanam

Reputation: 4807

Pandas pivot table selecting rows with maximum values

I have pandas dataframe as:

df

Id      Name        CaseId       Value 
82      A1          case1.01     37.71 
1558    A3          case1.01     27.71 
82      A1          case1.06     29.54 
1558    A3          case1.06     29.54 
82      A1          case1.11     12.09 
1558    A3          case1.11     32.09 
82      A1          case1.16     33.35 
1558    A3          case1.16     33.35 

For each Id, Name pair I need to select the CaseId with maximum value.

i.e. I am seeking the following output:

Id      Name        CaseId       Value 
82      A1          case1.01     37.71
1558    A3          case1.16     33.35

I tried the following:

import pandas as pd
pd.pivot_table(df, index=['Id', 'Name'], columns=['CaseId'], values=['Value'], aggfunc=[np.max])['amax']

But all it does is for each CaseId as column it gives maximum value and not the results that I am seeking above.

Upvotes: 7

Views: 3579

Answers (4)

kepler
kepler

Reputation: 1990

Another idea is to create a joint column, take its max, then split it back to two columns:

df['ValueCase'] = list(zip(df['Value'], df['CaseId']))
p = pd.pivot_table(df, index=['Id', 'Name'], values=['ValueCase'], aggfunc='max')
p['Value'], p['CaseId'] = list(zip(*p['ValueCase']))
del p['ValueCase']

Results in:

             CaseId  Value
Id   Name                 
82   A1    case1.01  37.71
1558 A3    case1.16  33.35

Upvotes: 2

BENY
BENY

Reputation: 323376

sort_values + drop_duplicates

df.sort_values('Value').drop_duplicates(['Id'],keep='last')
Out[93]: 
     Id Name    CaseId  Value
7  1558   A3  case1.16  33.35
0    82   A1  case1.01  37.71

Since we post same time , adding more method

df.sort_values('Value').groupby('Id').tail(1)
Out[98]: 
     Id Name    CaseId  Value
7  1558   A3  case1.16  33.35
0    82   A1  case1.01  37.71

Upvotes: 7

piRSquared
piRSquared

Reputation: 294516

With nlargest and groupby

pd.concat(d.nlargest(1, ['Value']) for _, d in df.groupby('Name'))

     Id Name    CaseId  Value
0    82   A1  case1.01  37.71
7  1558   A3  case1.16  33.35

Upvotes: 2

Joe
Joe

Reputation: 12417

This should work:

df = df.sort_values('Value', ascending=False).drop_duplicates('Id').sort_index()

Output:

     Id Name    CaseId  Value
0    82   A1  case1.01  37.71
7  1558   A3  case1.16  33.35

Upvotes: 2

Related Questions