Reputation: 3058

Keeping the lowest value of duplicates across multiple columns in dataframe

I have the following data frame:

import pandas as pd
data = dict(name=['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c'],
            objective=[20.0, 20.0, 25.0, 40.0, 40.5, 41.0, 60.0, 60.0],
            price=[0.5, 1.0, 1.5, 1.0, 1.2, 1.4, 0.5, 1.0])
df = pd.DataFrame(data, columns=data.keys())

I can then find the unique combinations of all these as follows:

df.groupby(['name','objective', 'price']).size()

which looks like this:

name  objective  price
a     20.0       0.5      1
                 1.0      1
      25.0       1.5      1
b     40.0       1.0      1
      40.5       1.2      1
      41.0       1.4      1
c     60.0       0.5      1
                 1.0      1

When there are multiple price values for a given name and objective, I want to keep the lower price value only, i.e.

name  objective  price
a     20.0       0.5      1
      25.0       1.5      1
b     40.0       1.0      1
      40.5       1.2      1
      41.0       1.4      1
c     60.0       0.5      1

How can I achieve this, please?

Upvotes: 0

Answers (3)

BENY

Reputation: 323316

What I will do

df.sort_values('price').drop_duplicates(['name','objective'],keep='last').assign(cnt=1)
Out[421]: 
  name  objective  price  cnt
0    a       20.0    0.5    1
2    a       25.0    1.5    1
3    b       40.0    1.0    1
4    b       40.5    1.2    1
5    b       41.0    1.4    1
6    c       60.0    0.5    1

Upvotes: 2

Aldric

Reputation: 507

you can use groupby and minimum

df = df.groupby(['name','objective']).min()

Upvotes: 1

Quang Hoang

Reputation: 150775

You can do another groupby and first:

(df.groupby(['name','objective', 'price']).size()
   .reset_index()
   .groupby(['name', 'objective'])
   .first()
)

Output:

                price  0
name objective          
a    20.0         0.5  1
     25.0         1.5  1
b    40.0         1.0  1
     40.5         1.2  1
     41.0         1.4  1
c    60.0         0.5  1

Upvotes: 2

Keeping the lowest value of duplicates across multiple columns in dataframe

Answers (3)

Related Questions