Reputation:
from this question : Python: Best Way to remove duplicate character from string answer:
''.join(ch for ch, _ in itertools.groupby(string_to_remove)
I know how to remove duplicated letters exists only next to each other, how to apply this solution to column in pandas?
df:
df=pd.DataFrame({'A':['ODOODY','LLHHEELLO'],'B':['NNMminee','DDasdss']})
expected result:
A,B
ODODY,NMine
LHELO,Dasds
tried:
df['A'] = df['A'].apply(lambda x: ''.join(ch for ch, _ in itertools.groupby(x['A'])))
thanks !
Upvotes: 1
Views: 803
Reputation: 862441
Use DataFrame.applymap
, if necessary filter columns for remove duplicates:
import itertools
cols = ['A','B']
df[cols] = df[cols].applymap(lambda x: ''.join(ch for ch, _ in itertools.groupby(x)))
#for all columns
#df = df.applymap(lambda x: ''.join(ch for ch, _ in itertools.groupby(x)))
print (df)
A B
0 ODODY NMmine
1 LHELO Dasds
Solution with DataFrame.apply
is possible, but need process each value separately, so aded list comprehension:
df[cols] = df[cols].apply(lambda x: [''.join(ch for ch, _ in itertools.groupby(y)) for y in x])
print (df)
A B
0 ODODY NMmine
1 LHELO Dasds
Or use Series.apply
:
f = lambda x: ''.join(ch for ch, _ in itertools.groupby(x))
df['A'] = df['A'].apply(f)
df['B'] = df['B'].apply(f)
Upvotes: 1