Reputation: 1266
This is what my initial dataframe looks like:
pd.DataFrame({'a':['a','b','b','c'],
'b': [1,2,3,4],
'c': [2,3,4,1],'d':[1.1,1.2,1.3,1.4]})
a b c d
0 a 1 2 1.1
1 b 2 3 1.2
2 b 3 4 1.3
3 c 4 1 1.4
For the duplicate values in column a, I want to keep the minimum value for column b and the maximum value for column c
The output should be like this:
a b c d
0 a 1 2 1.1
1 b 2 4 1.2
2 b 2 4 1.3
3 c 4 1 1.4
Is there a pandas function that does that? I tried looking into
pandas.DataFrame.drop_duplicates
pandas.DataFrame.duplicated
However, I didn't find anything that will work for my use case.
Upvotes: 1
Views: 1010
Reputation: 26676
m=df.a.duplicated(False)#boolean select all duplicated in column a
df.loc[m,['b','c']]=df.loc[m,'b'].min(),df.loc[m,'c'].max()#mask columns b and c and assign min and max as per condition
print(df)
a b c d
0 a 1 2 1.1
1 b 2 4 1.2
2 b 2 4 1.3
3 c 4 1 1.4
Upvotes: 0
Reputation: 10624
The following should work:
df2=df.groupby('a').agg({'b':'min', 'c':'max'})
result=df2.merge(df, on='a')[['a', 'b_x', 'c_x', 'd']]
result.columns=df.columns
>>> print(result)
a b c d
0 a 1 2 1.1
1 b 2 4 1.2
2 b 2 4 1.3
3 c 4 1 1.4
Upvotes: 0
Reputation: 863166
Use GroupBy.transform
with min
and max
what return same values for unique groups:
df = pd.DataFrame({'a':['a','b','b','c'],
'b': [1,2,3,4],
'c': [2,3,4,1],'d':[1.1,1.2,1.3,1.4]})
df['b'] = df.groupby('a')['b'].transform('min')
df['c'] = df.groupby('a')['c'].transform('max')
print (df)
a b c d
0 a 1 2 1.1
1 b 2 4 1.2
2 b 2 4 1.3
3 c 4 1 1.4
Upvotes: 2