Reputation: 1197
I am trying to get the frequency distribution of the tags in this data frame.
the problem is each row contains a list of tags, not just one. hence, I can't use
df['Tags'].value_counts()
so How can I do that?
Upvotes: 1
Views: 837
Reputation: 862511
For pandas 0.25+ use Series.explode
:
s = df['Tags'].explode().value_counts()
Another solution with DataFrame constructor and DataFrame.stack
working also for version under 0.25
:
s = pd.DataFrame(df['Tags'].tolist()).stack().value_counts()
Or is possible use pure python with Counter
and flattening:
from collections import Counter
s = pd.Series(Counter([y for x in df['Tags'] for y in x]))
Sample:
df = pd.DataFrame({'Tags':[['a','b'],['a','b','c'],['c','b','c'], ['c']]})
s = df['Tags'].explode().value_counts()
print(s)
c 4
b 3
a 2
Name: Tags, dtype: int64
Upvotes: 2