Reputation: 105
I'm trying to do a value_count for a specific column in my dataframe
For example:
<Fruit>
0 'apple'
1 'apple, orange'
2 'orange'
How do I sum it so it will count it even if it is in a list? So the above should give me:
'Apple' 2
'Orange' 2
I tried turning the string into a list, but not sure how to value_count over fields with a list of values.
Upvotes: 5
Views: 16014
Reputation: 128948
This is a pandonic way
In [8]: s
Out[8]:
0 apple
1 apple, orange
2 orange
dtype: object
Split the strings by their separators, turn them into Series and count them.
In [9]: s.str.split(',\s+').apply(lambda x: Series(x).value_counts()).sum()
Out[9]:
apple 2
orange 2
dtype: float64
Upvotes: 6
Reputation: 85603
This is your dataframe:
df = p.DataFrame(['apple', 'apple, orange', 'orange'], columns= ['fruit'])
Then just join all your entries in the fruit column with a comma, eliminate extra spaces, and split again to have a list with all your fruits. Finally count them:
>>> from collections import Counter
>>> Counter(','.join(df['fruit']).replace(' ', '').split(','))
Counter({'orange': 2, 'apple': 2})
Upvotes: 0