Create term frequency dictionary from dataframe column list

Question

I have a dataframe with a list of strings as a column and want to use collections.counter to create a term frequency dictionary. The dataframe looks like the following:

>>> job_title['title']
0         [responsible, caring, trustworthy, babysitter]
1               [compassionate, trustworthy, babysitter]
2      [family, looking, kindergarten, preschool, chi...
3      [babysitter, needed, 2, children, bee, cave, n...
4             [fun, patient, nonjudgemental, babysitter]
5      [responsible, interactive, intelligent, babysi...
6                    [responsible, friendly, babysitter]
7      [family, looking, kindergarten, preschool, chi...
8      [family, looking, kindergarten, preschool, chi...
9                     [reliable, clean, friendly, nanny]

What's the most efficient way to accomplish this?

jezrael · Accepted Answer

I think you can flat lists by chain.from_iterable and then use Counter:

from  itertools import chain
from collections import Counter

print (Counter(chain.from_iterable(job_title.title)))

Sample:

job_title = pd.DataFrame({'title':[['responsible', 'caring', 'trustworthy', 'babysitter'],
                                   ['compassionate', 'trustworthy', 'babysitter']]})

print (job_title)
                                            title
0  [responsible, caring, trustworthy, babysitter]
1        [compassionate, trustworthy, babysitter]


print (Counter(chain.from_iterable(job_title.title)))
Counter({'babysitter': 2, 'trustworthy': 2, 
         'compassionate': 1, 'responsible': 1, 'caring': 1})

Create term frequency dictionary from dataframe column list

Answers (1)

Related Questions