Amr Rady
Amr Rady

Reputation: 1197

How can I get frequency distribution for each element in lists in pandas data frame?

I am trying to get the frequency distribution of the tags in this data frame.

enter image description here

the problem is each row contains a list of tags, not just one. hence, I can't use

df['Tags'].value_counts()

so How can I do that?

Upvotes: 1

Views: 837

Answers (1)

jezrael
jezrael

Reputation: 862511

For pandas 0.25+ use Series.explode:

s = df['Tags'].explode().value_counts()

Another solution with DataFrame constructor and DataFrame.stack working also for version under 0.25:

s = pd.DataFrame(df['Tags'].tolist()).stack().value_counts()

Or is possible use pure python with Counter and flattening:

from collections import Counter

s = pd.Series(Counter([y for x in df['Tags'] for y in x]))

Sample:

df = pd.DataFrame({'Tags':[['a','b'],['a','b','c'],['c','b','c'], ['c']]})
s = df['Tags'].explode().value_counts()
print(s)
c    4
b    3
a    2
Name: Tags, dtype: int64

Upvotes: 2

Related Questions