how to keep the value of a column that has the highest value on another column with groupby in pandas

Question

I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'var': ['A', 'A', 'B', 'B', 'C', 'C', 'C'],
                       'value': [1, 2, 1, 2, 3, 4, 5],
                       'input': [0.1, 0.1, 0.2, 0.2, 0.3, 0.3, 0.3]})

I would like to keep the var for which the value is the highest by input and set the rest of the var to NA.

So I would like to end up with:

   df = pd.DataFrame({'var': [np.nan, 'A', np.nan, 'B', np.nan, np.nan, 'C'],
                       'value': [1, 2, 1, 2, 3, 4, 5],
                       'input': [0.1, 0.1, 0.2, 0.2, 0.3, 0.3, 0.3]})

Any ideas ?

jezrael · Accepted Answer

Use GroupBy.transform with max for Series with same size like original DataFrame, compare for not equal by Series.ne and set new values with loc:

mask = df.groupby('var')['value'].transform('max').ne(df['value'])

df.loc[mask, 'var'] = np.nan
print (df)
   var  value  input
0  NaN      1    0.1
1    A      2    0.2
2  NaN      1    0.3
3    B      2    0.4
4  NaN      3    0.5
5  NaN      4    0.6
6    C      5    0.7

how to keep the value of a column that has the highest value on another column with groupby in pandas

Answers (2)

Related Questions