How can I get frequency distribution for each element in lists in pandas data frame?

Question

I am trying to get the frequency distribution of the tags in this data frame.

the problem is each row contains a list of tags, not just one. hence, I can't use

df['Tags'].value_counts()

so How can I do that?

jezrael · Accepted Answer

For pandas 0.25+ use Series.explode:

s = df['Tags'].explode().value_counts()

Another solution with DataFrame constructor and DataFrame.stack working also for version under 0.25:

s = pd.DataFrame(df['Tags'].tolist()).stack().value_counts()

Or is possible use pure python with Counter and flattening:

from collections import Counter

s = pd.Series(Counter([y for x in df['Tags'] for y in x]))

Sample:

df = pd.DataFrame({'Tags':[['a','b'],['a','b','c'],['c','b','c'], ['c']]})
s = df['Tags'].explode().value_counts()
print(s)
c    4
b    3
a    2
Name: Tags, dtype: int64

How can I get frequency distribution for each element in lists in pandas data frame?

Answers (1)

Related Questions