Reputation: 25416
I have the following data frame:
name pet
----------------
John ['cat']
Mary ['cat','dog','bird']
Ann ['bird','rat']
Dave ['cow','dog']
For each person, the column pet
is a list of animals. I need to get a final list of all pets (no duplicates):
final_list = ['cat', 'dog', 'bird', 'rat', 'cow']
Is there a more elegant way to achieve this (than e.g. naively looping over the dataframe row by row)?
Upvotes: 2
Views: 73
Reputation: 24371
You can use the tolist
function to get a list of all the values, flatten them with itertools.chain
, and then convert to a set
to get the unique values:
import itertools
dfList = df['pet'].tolist()
final_list = list(set(itertools.chain.from_iterable(dfList)))
print(final_list)
>>> ['cat', 'dog', 'bird', 'rat', 'cow']
Upvotes: 3
Reputation: 51425
You could also do this in a list comprehension (though @ukemi's method is more elegant):
>>> [i for i in set(df.pet.apply(pd.Series).values.flatten().tolist()) if type(i) == str]
['cat', 'bird', 'cow', 'dog', 'rat']
Upvotes: 2