Reputation: 1293
I have a Pandas Dataframe in the below format.
[apple]
[banana]
[apple, orange]
I would like to convert this such that it has only unique values but it split by row for each value:
apple
banana
orange
Upvotes: 0
Views: 100
Reputation: 75080
You can use itertools.chain
and from_iterable()
to flatten list of lists and the OrderedDict
to drop duplicates maintaining order:
from collections import OrderedDict
import itertools
df['col2']=OrderedDict.fromkeys(itertools.chain.from_iterable(df.col)).keys()
print(df)
col col2
0 [apple] apple
1 [banana] banana
2 [apple, orange] orange
Upvotes: 2
Reputation: 42886
First unnest
your list to rows, then use drop_duplicates
:
# Make example dataframe
df = pd.DataFrame({'Col1':[['apple'], ['banana'], ['apple', 'orange']]})
Col1
0 [apple]
1 [banana]
2 [apple, orange]
df = explode_list(df, 'Col1').drop_duplicates()
Output
Col1
0 apple
1 banana
2 orange
Function used from linked answer
def explode_list(df, col):
s = df[col]
i = np.arange(len(s)).repeat(s.str.len())
return df.iloc[i].assign(**{col: np.concatenate(s)})
Upvotes: 2