Pandas split value in rows into multiple rows based on delimiter

Question

I have a Pandas Dataframe in the below format.

[apple]
[banana]
[apple, orange]

I would like to convert this such that it has only unique values but it split by row for each value:

apple
banana
orange

Erfan · Accepted Answer

First unnest your list to rows, then use drop_duplicates:

# Make example dataframe
df = pd.DataFrame({'Col1':[['apple'], ['banana'], ['apple', 'orange']]})

              Col1
0          [apple]
1         [banana]
2  [apple, orange]

df = explode_list(df, 'Col1').drop_duplicates()

Output

     Col1
0   apple
1  banana
2  orange

Function used from linked answer

def explode_list(df, col):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.len())
    return df.iloc[i].assign(**{col: np.concatenate(s)})

Pandas split value in rows into multiple rows based on delimiter

Answers (2)

Related Questions