scott martin
scott martin

Reputation: 1293

Pandas split value in rows into multiple rows based on delimiter

I have a Pandas Dataframe in the below format.

[apple]
[banana]
[apple, orange]

I would like to convert this such that it has only unique values but it split by row for each value:

apple
banana
orange

Upvotes: 0

Views: 100

Answers (2)

anky
anky

Reputation: 75080

You can use itertools.chain and from_iterable() to flatten list of lists and the OrderedDict to drop duplicates maintaining order:

from collections import OrderedDict
import itertools

df['col2']=OrderedDict.fromkeys(itertools.chain.from_iterable(df.col)).keys()
print(df)

               col    col2
0          [apple]   apple
1         [banana]  banana
2  [apple, orange]  orange

Upvotes: 2

Erfan
Erfan

Reputation: 42886

First unnest your list to rows, then use drop_duplicates:

# Make example dataframe
df = pd.DataFrame({'Col1':[['apple'], ['banana'], ['apple', 'orange']]})

              Col1
0          [apple]
1         [banana]
2  [apple, orange]

df = explode_list(df, 'Col1').drop_duplicates()

Output

     Col1
0   apple
1  banana
2  orange

Function used from linked answer

def explode_list(df, col):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.len())
    return df.iloc[i].assign(**{col: np.concatenate(s)})

Upvotes: 2

Related Questions