pajamas
pajamas

Reputation: 1266

Pandas for partially duplicate rows, keep row and replace data with minimum or maximum value

This is what my initial dataframe looks like:

pd.DataFrame({'a':['a','b','b','c'],
             'b': [1,2,3,4],
             'c': [2,3,4,1],'d':[1.1,1.2,1.3,1.4]})

    a   b   c   d

0   a   1   2   1.1

1   b   2   3   1.2

2   b   3   4   1.3

3   c   4   1   1.4

For the duplicate values in column a, I want to keep the minimum value for column b and the maximum value for column c

The output should be like this:


    a   b   c   d

0   a   1   2   1.1

1   b   2   4   1.2

2   b   2   4   1.3

3   c   4   1   1.4

Is there a pandas function that does that? I tried looking into

pandas.DataFrame.drop_duplicates
pandas.DataFrame.duplicated 

However, I didn't find anything that will work for my use case.

Upvotes: 1

Views: 1010

Answers (3)

wwnde
wwnde

Reputation: 26676

m=df.a.duplicated(False)#boolean select all duplicated in column a
df.loc[m,['b','c']]=df.loc[m,'b'].min(),df.loc[m,'c'].max()#mask columns b and c and assign min and max as per condition
print(df)



  a  b  c    d
0  a  1  2  1.1
1  b  2  4  1.2
2  b  2  4  1.3
3  c  4  1  1.4

Upvotes: 0

IoaTzimas
IoaTzimas

Reputation: 10624

The following should work:

df2=df.groupby('a').agg({'b':'min', 'c':'max'})
result=df2.merge(df, on='a')[['a', 'b_x', 'c_x', 'd']]
result.columns=df.columns

>>> print(result)
   a  b  c    d
0  a  1  2  1.1
1  b  2  4  1.2
2  b  2  4  1.3
3  c  4  1  1.4

Upvotes: 0

jezrael
jezrael

Reputation: 863166

Use GroupBy.transform with min and max what return same values for unique groups:

df = pd.DataFrame({'a':['a','b','b','c'],
             'b': [1,2,3,4],
             'c': [2,3,4,1],'d':[1.1,1.2,1.3,1.4]})

df['b'] = df.groupby('a')['b'].transform('min')
df['c'] = df.groupby('a')['c'].transform('max')
print (df)
   a  b  c    d
0  a  1  2  1.1
1  b  2  4  1.2
2  b  2  4  1.3
3  c  4  1  1.4

Upvotes: 2

Related Questions