Reputation: 79
In python, i have a data frame like this
Fruits | |
---|---|
James | [Apple, Pear, Apple] |
Peter | [Apple, Pear, Apple] |
I would like to get the count of both apple and pear. Would appreciate any help in this.
Fruits | Apple | Pear | |
---|---|---|---|
James | [Apple, Pear, Apple] | 2 | 1 |
Peter | [Apple, Pear, Apple] | 2 | 1 |
I tried using this :
d['Apple'] = (d.Fruits == 'Apple').sum() and
d['Apple'] = (d.Fruits.values == 'Apple').sum()
Upvotes: 3
Views: 64
Reputation: 188
For any list you can use Collections.Counter()
it works with an easy logic such as Counter(item)
You can loop your entire list and counter your item it will give your output.
Upvotes: 1
Reputation: 75080
You can use df.explode
and groupby.value_counts
with unstack
:
out = (df.join(df['Fruits'].explode().groupby(level=0).value_counts()
.unstack(fill_value=0)))
print(out)
Fruits Apple Pear
James [Apple, Pear, Apple] 2 1
Peter [Apple, Pear, Apple] 2 1
Upvotes: 2
Reputation: 61910
Use value_counts + concat:
res = pd.concat((df, df['Fruits'].apply(pd.Series.value_counts)), 1)
print(res)
Output
Fruits Apple Pear
James [Apple, Pear, Apple] 2 1
Peter [Apple, Pear, Apple] 2 1
A more general approach is to do:
res = pd.concat((df, df['Fruits'].apply(pd.Series.value_counts).fillna(0)), 1)
print(res)
Upvotes: 2
Reputation: 862581
Solution if performance is important and need count all values:
from collections import Counter
df = df.join(pd.DataFrame([Counter(x) for x in df.Fruits.to_numpy()], index=df.index))
print (df)
Fruits Apple Pear
James [Apple, Pear, Apple] 2 1
Peter [Apple, Pear, Apple] 2 1
If want test values sepately:
df['Apple'] = df.Fruits.apply(lambda x: sum(y == 'Apple' for y in x))
df['Pear'] = df.Fruits.apply(lambda x: sum(y == 'Pear' for y in x))
Upvotes: 2