John Taylor
John Taylor

Reputation: 737

remove duplicates in list in column in Pandas

Pandas perhaps way out there question.

Have a dataframe like this

    Col1           Col2
['joe', 'joe']     ['joe']
['sam','bob']     ['sam'.'bob']
['mary','mary']   ['mary']

I want to use an apply function on Col1 to get the result in Col2. Meaning, I want the lists with duplicates in Col1 to no longer have those duplicates in Col2. Tried various functions with apply and set, no dice. Seems like it should be straightforward, but hold on to the laptop, it isn't. Or so it seems..

Upvotes: 0

Views: 147

Answers (2)

BENY
BENY

Reputation: 323226

For get the col two

df['ColB'] = df['Col1'].explode().groupby(level=0).unique()

Upvotes: 2

wasif
wasif

Reputation: 15480

How about apply list(set(x)) on the column? Cool RAW attempt ;-)

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'A': [[1,2],[3,4,3],[6,7,8]]
})
df['A'] = df['A'].apply(lambda x: list(set(x)))
print(df)

Still none can beat EXPLODE!!

df['A'].explode().groupby(level=0).unique()

Upvotes: 0

Related Questions