Edamame
Edamame

Reputation: 25416

Pandas: Aggregate the values of a column

I have the following data frame:

name      pet
----------------
John      ['cat']
Mary      ['cat','dog','bird']
Ann       ['bird','rat']
Dave      ['cow','dog']

For each person, the column pet is a list of animals. I need to get a final list of all pets (no duplicates):

final_list = ['cat', 'dog', 'bird', 'rat', 'cow']

Is there a more elegant way to achieve this (than e.g. naively looping over the dataframe row by row)?

Upvotes: 2

Views: 73

Answers (3)

Quantum_Something
Quantum_Something

Reputation: 116

You could also simply use pandas and do:

df.pet.unique() 

Upvotes: 0

iacob
iacob

Reputation: 24371

You can use the tolist function to get a list of all the values, flatten them with itertools.chain, and then convert to a set to get the unique values:

import itertools

dfList = df['pet'].tolist()
final_list = list(set(itertools.chain.from_iterable(dfList)))
print(final_list)
>>> ['cat', 'dog', 'bird', 'rat', 'cow']

Upvotes: 3

sacuL
sacuL

Reputation: 51425

You could also do this in a list comprehension (though @ukemi's method is more elegant):

>>> [i for i in set(df.pet.apply(pd.Series).values.flatten().tolist()) if type(i) == str]
['cat', 'bird', 'cow', 'dog', 'rat']

Upvotes: 2

Related Questions