Reputation: 10011
I have a Pandas DataFrame like this :
id fruits
01 Apple, Apricot
02 Apple, Banana, Clementine, Pear
03 Orange, Pineapple, Pear
How can i get a list of fruits like this by deleting duplicates?
['Apple','Apricot','Banana','Clementine','Orange','Pear','Pineapple']
Upvotes: 3
Views: 142
Reputation: 11192
try this,
set(', '.join(df['fruits']).split(', '))
Output:
set(['Apple', 'Apricot', 'Pear', 'Pineapple', 'Orange', 'Banana', 'Clementine'])
Upvotes: 1
Reputation: 862611
You can flatten list
s created by split
, convert to set
s for unique and last to list
s:
a = list(set([item for sublist in df['fruits'].str.split(', ') for item in sublist]))
print (a)
['Pineapple', 'Clementine', 'Apple', 'Banana', 'Apricot', 'Orange', 'Pear']
Or:
a = df['fruits'].str.split(', ', expand=True).stack().drop_duplicates().tolist()
print (a)
['Apple', 'Apricot', 'Banana', 'Clementine', 'Pear', 'Orange', 'Pineapple']
Thanks @kabanus for alternative:
a = list(set(sum(df['fruits'].str.split(', '),[])))
Upvotes: 5
Reputation: 28253
using str.extractall
& drop_duplicates
df.fruits.str.extractall(r'(\w+)').drop_duplicates()[0].tolist()
outputs:
['Apple', 'Apricot', 'Banana', 'Clementine', 'Pear', 'Orange', 'Pineapple']
Upvotes: 3