Yehoshaphat Schellekens
Yehoshaphat Schellekens

Reputation: 2385

Apply .value_counts() on DataFrame with lists populated in each cell

I'm currently using a data frame, which has a column of type list (with strings) in each of its cells. I'm interested in applying value.counts() on it as if all the lists would have been concatinated into a single huge list (tried to do that, didn't work very well)

Toy example of the data structure that i have:

import pandas as pd
df_list = pd.DataFrame({'listcol':[['a','b','c'],['a','b','c']]})
print df_list
     listcol
0  [a, b, c]
1  [a, b, c]

I would like to apply on it value.counts() as it would have, if it was a big concatinated list as following:

#desired output:
df=pd.DataFrame(['a','b','c','a','b','c'])
df.columns = ['col']
df.col.value_counts() #desired output!
b    2
c    2
a    2

Thanks in advance!

Upvotes: 4

Views: 1002

Answers (1)

jezrael
jezrael

Reputation: 862611

I think you need first create flatten list and then apply Counter, last create Series:

from  itertools import chain
from collections import Counter

print (Counter(chain.from_iterable(df_list['listcol'])))
Counter({'b': 2, 'a': 2, 'c': 2}

s = pd.Series(Counter(chain.from_iterable(df_list['listcol'])))
print (s)
a    2
b    2
c    2
dtype: int64

Or create Series and use value_counts:

#for python 2 omit list
s = pd.Series(list(chain.from_iterable(df_list['listcol'])))
print (s)
0    a
1    b
2    c
3    a
4    b
5    c
dtype: object

print (s.value_counts())
c    2
a    2
b    2
dtype: int64

Upvotes: 6

Related Questions