Reputation: 4482
I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'var': ['A', 'A', 'B', 'B', 'C', 'C', 'C'],
'value': [1, 2, 1, 2, 3, 4, 5],
'input': [0.1, 0.1, 0.2, 0.2, 0.3, 0.3, 0.3]})
I would like to keep the var
for which the value
is the highest by input
and set the rest of the var
to NA
.
So I would like to end up with:
df = pd.DataFrame({'var': [np.nan, 'A', np.nan, 'B', np.nan, np.nan, 'C'],
'value': [1, 2, 1, 2, 3, 4, 5],
'input': [0.1, 0.1, 0.2, 0.2, 0.3, 0.3, 0.3]})
Any ideas ?
Upvotes: 1
Views: 39
Reputation: 2757
where
is a useful trick to set NaNs in a column based on some conditions.
df['var'].where(df.groupby('var').value.transform('max').eq(df['value']))
Output
0 NaN
1 A
2 NaN
3 B
4 NaN
5 NaN
6 C
Upvotes: 1
Reputation: 862901
Use GroupBy.transform
with max
for Series
with same size like original DataFrame, compare for not equal by Series.ne
and set new values with loc
:
mask = df.groupby('var')['value'].transform('max').ne(df['value'])
df.loc[mask, 'var'] = np.nan
print (df)
var value input
0 NaN 1 0.1
1 A 2 0.2
2 NaN 1 0.3
3 B 2 0.4
4 NaN 3 0.5
5 NaN 4 0.6
6 C 5 0.7
Upvotes: 3