Albert
Albert

Reputation: 311

Python Pandas - Highest values from multiple columns of duplicated entries

I have the following df:

A        B   C
entry1   10  3
entry1   7   5 
entry1   2   NaN
entry2   1   1
entry2   2   4
entry2   3   3  
...  

In column A I have duplicated entries and I am interested to pick the highest values from column B and C for each entry. In case that NaN value exists, this overwrites the highest values.

Example:

For entry1, I would like to keep in the df just value 10 in column B and NaN in column C.

For entry2, I would like to keep in the df just value 3 in column B and 4 in column C.

I was thinking to sort the values and then to pick the last value like this:

df = df.sort_values(by=['A', 'B', 'C'], ascending=[False, True, True])
print(df.drop_duplicates(subset='A', keep='last'))

In this way, the remaining entry will have just the highest value for column B, not also for column C.

How I can achieve that as well?

Thank you.

Upvotes: 2

Views: 815

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

One approach would be to fillna the nan with inf then just take the groupby max to get the max value per group. Then replace the inf back to nan. This ensures that a nan value will be maximal per group since inf is:

cols = ['B', 'C']
df[cols] = df[cols].fillna(np.inf)  # replace NaN with largest value
df = df.groupby('A', as_index=False)[cols].max()  # Get Max Per Group
df[cols] = df[cols].replace(np.inf, np.nan)  # Return to NaN

Alternatively use a function or lambda to produce nan if any values are NaN otherwise the max:

df = (
    df.groupby('A', as_index=False)[['B', 'C']]
        .agg(lambda s: np.nan if s.isna().any() else s.max())
)

Both Produce df:

        A   B    C
0  entry1  10  NaN
1  entry2   3  4.0

DataFrame and imports:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'A': ['entry1', 'entry1', 'entry1', 'entry2', 'entry2', 'entry2'],
    'B': [10, 7, 2, 1, 2, 3],
    'C': [3.0, 5.0, np.nan, 1.0, 4.0, 3.0]
})

Upvotes: 3

Related Questions