Python Pandas - Highest values from multiple columns of duplicated entries

Question

I have the following df:

A        B   C
entry1   10  3
entry1   7   5 
entry1   2   NaN
entry2   1   1
entry2   2   4
entry2   3   3  
...

In column A I have duplicated entries and I am interested to pick the highest values from column B and C for each entry. In case that NaN value exists, this overwrites the highest values.

Example:

For entry1, I would like to keep in the df just value 10 in column B and NaN in column C.

For entry2, I would like to keep in the df just value 3 in column B and 4 in column C.

I was thinking to sort the values and then to pick the last value like this:

df = df.sort_values(by=['A', 'B', 'C'], ascending=[False, True, True])
print(df.drop_duplicates(subset='A', keep='last'))

In this way, the remaining entry will have just the highest value for column B, not also for column C.

How I can achieve that as well?

Thank you.

Henry Ecker · Accepted Answer

One approach would be to fillna the nan with inf then just take the groupby max to get the max value per group. Then replace the inf back to nan. This ensures that a nan value will be maximal per group since inf is:

cols = ['B', 'C']
df[cols] = df[cols].fillna(np.inf)  # replace NaN with largest value
df = df.groupby('A', as_index=False)[cols].max()  # Get Max Per Group
df[cols] = df[cols].replace(np.inf, np.nan)  # Return to NaN

Alternatively use a function or lambda to produce nan if any values are NaN otherwise the max:

df = (
    df.groupby('A', as_index=False)[['B', 'C']]
        .agg(lambda s: np.nan if s.isna().any() else s.max())
)

Both Produce df:

        A   B    C
0  entry1  10  NaN
1  entry2   3  4.0

DataFrame and imports:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'A': ['entry1', 'entry1', 'entry1', 'entry2', 'entry2', 'entry2'],
    'B': [10, 7, 2, 1, 2, 3],
    'C': [3.0, 5.0, np.nan, 1.0, 4.0, 3.0]
})

Python Pandas - Highest values from multiple columns of duplicated entries

Answers (1)

Related Questions