Reputation: 1133
I have a dataframe with a list of strings as a column and want to use collections.counter to create a term frequency dictionary. The dataframe looks like the following:
>>> job_title['title']
0 [responsible, caring, trustworthy, babysitter]
1 [compassionate, trustworthy, babysitter]
2 [family, looking, kindergarten, preschool, chi...
3 [babysitter, needed, 2, children, bee, cave, n...
4 [fun, patient, nonjudgemental, babysitter]
5 [responsible, interactive, intelligent, babysi...
6 [responsible, friendly, babysitter]
7 [family, looking, kindergarten, preschool, chi...
8 [family, looking, kindergarten, preschool, chi...
9 [reliable, clean, friendly, nanny]
What's the most efficient way to accomplish this?
Upvotes: 3
Views: 131
Reputation: 863166
I think you can flat lists
by chain.from_iterable
and then use Counter
:
from itertools import chain
from collections import Counter
print (Counter(chain.from_iterable(job_title.title)))
Sample:
job_title = pd.DataFrame({'title':[['responsible', 'caring', 'trustworthy', 'babysitter'],
['compassionate', 'trustworthy', 'babysitter']]})
print (job_title)
title
0 [responsible, caring, trustworthy, babysitter]
1 [compassionate, trustworthy, babysitter]
print (Counter(chain.from_iterable(job_title.title)))
Counter({'babysitter': 2, 'trustworthy': 2,
'compassionate': 1, 'responsible': 1, 'caring': 1})
Upvotes: 2