Pandas drop duplicates with groupby and according to various conditions

Question

I have DataFrame:

df = pd.read_csv(...)

a   b   c      d     e     f      
1  two  adc   aaaa   Nan   mmm    
2  one  Nan   aaa    Nan   nnn    
1  one  ab    Nan    Nan   ww     
1  two  abcd  aaa    ff    uiww  
1  two  a     aaa    d     iii

I want to drop duplicates according to 'a' and 'b'.

df = df.drop_duplicates(['a', 'b'])

But I want to leave in columns 'c', 'd' and 'e' value with max length.
And I want to leave in column 'f': value that does not contain ('m' or 'n') or the value that contains ('w' or 'y'). If the previous conditions are not met, then take any value.

I want to get this result:

a   b   c      d     e     f      
1  two  abcd  aaaa   ff    uiww   
2  one  Nan   aaa    Nan   nnn    
1  one  ab    Nan    Nan   ww

I tried to use transform and apply, but it was not possible to reduce to one scheme. What would be the most efficient way to achieve this?

BENY · Accepted Answer

Based on your conditions create the function , then using agg with groupby

def yourfunc1(x):
    return x.loc[x.str.len().idxmax()]
def yourfunc2(x):
    if any(x.str.contains('w|y')|(~x.str.contains('m|n'))):
       return x.loc[x.str.contains('w|y')|(~x.str.contains('m|n'))].iloc[0]
    else :
        return x.iloc[0]
df=df.replace({'Nan':''})
s=df.groupby(['a','b'],as_index=False).agg({'c':yourfunc1,'d':yourfunc1,'e':yourfunc1,'f':yourfunc2})
   a    b     c     d   e     f
0  1  one    ab              ww
1  1  two  abcd  aaaa  ff  uiww
2  2  one         aaa       nnn

Pandas drop duplicates with groupby and according to various conditions

Answers (2)

Related Questions