ah bon
ah bon

Reputation: 10011

merge a string column to a set of list using Python

I have a Pandas DataFrame like this :

id     fruits
01     Apple, Apricot
02     Apple, Banana, Clementine, Pear
03     Orange, Pineapple, Pear

How can i get a list of fruits like this by deleting duplicates?

['Apple','Apricot','Banana','Clementine','Orange','Pear','Pineapple']

Upvotes: 3

Views: 142

Answers (3)

Mohamed Thasin ah
Mohamed Thasin ah

Reputation: 11192

try this,

set(', '.join(df['fruits']).split(', '))

Output:

set(['Apple', 'Apricot', 'Pear', 'Pineapple', 'Orange', 'Banana', 'Clementine'])

Upvotes: 1

jezrael
jezrael

Reputation: 862611

You can flatten lists created by split, convert to sets for unique and last to lists:

a = list(set([item for sublist in df['fruits'].str.split(', ') for item in sublist]))
print (a)
['Pineapple', 'Clementine', 'Apple', 'Banana', 'Apricot', 'Orange', 'Pear']

Or:

a = df['fruits'].str.split(', ', expand=True).stack().drop_duplicates().tolist()
print (a)
['Apple', 'Apricot', 'Banana', 'Clementine', 'Pear', 'Orange', 'Pineapple']

Thanks @kabanus for alternative:

a = list(set(sum(df['fruits'].str.split(', '),[])))

Upvotes: 5

Haleemur Ali
Haleemur Ali

Reputation: 28253

using str.extractall & drop_duplicates

df.fruits.str.extractall(r'(\w+)').drop_duplicates()[0].tolist()

outputs:

['Apple', 'Apricot', 'Banana', 'Clementine', 'Pear', 'Orange', 'Pineapple']

Upvotes: 3

Related Questions