Reputation: 167
How to go about removing duplicates column by column in a pandas data frame so that:
set1 set2 set3 set4
apple apple orange orange
apple orange banana orange
orange banana pear
banana banana lemon
pear lemon
grape lemon
becomes:
set1 set2 set3 set4
apple apple orange orange
orange orange banana
banana banana pear
pear lemon
grape
Upvotes: 3
Views: 97
Reputation: 294258
itertools.zip_longest
from itertools import zip_longest
pd.DataFrame(
[*zip_longest(*({*df[c].dropna()} for c in df))],
columns=[*df]
)
set1 set2 set3 set4
0 banana orange banana orange
1 grape banana lemon None
2 pear apple pear None
3 apple None orange None
4 orange None None None
collections.defaultdict
and itertools.count
# %%timeit
from collections import defaultdict
from itertools import count
i = defaultdict(count)
pd.DataFrame({c: {next(i[c]): v for v in {*df[c].dropna()}} for c in df})
set1 set2 set3 set4
0 pear apple orange orange
1 grape banana lemon NaN
2 apple orange banana NaN
3 banana NaN pear NaN
4 orange NaN NaN NaN
Upvotes: 3
Reputation: 33
You can also use drop_duplicates
:
df.apply(lambda x : x.drop_duplicates().reset_index(drop=True))
>
set1 set2 set3 set4
0 apple apple orange orange
1 orange orange banana NaN
2 banana banana pear NaN
3 pear NaN lemon NaN
4 grape NaN NaN NaN
Upvotes: 1
Reputation: 323226
Here is another way pivot
df.melt().dropna().drop_duplicates(['variable','value']).\
assign(key=lambda x : x.groupby('variable').cumcount()).pivot(index='key',columns='variable',values='value')
Out[806]:
variable set1 set2 set3 set4
key
0 apple apple orange orange
1 orange orange banana NaN
2 banana banana pear NaN
3 pear NaN lemon NaN
4 grape NaN NaN NaN
Upvotes: 3
Reputation: 75080
Use:
m=df.apply(lambda x:dict.fromkeys(x).keys())
pd.DataFrame(m.values.tolist(),index=m.index).T
Or a better way courtesy @piRSquared
:
pd.DataFrame.from_dict({k: {*df[k].dropna()} for k in df}, orient='index').T
set1 set2 set3 set4
0 apple apple orange orange
1 orange orange banana NaN
2 banana banana pear None
3 pear NaN lemon None
4 grape None None None
Upvotes: 3