Reputation: 5
I have created a list of words associated with a certain category. For example:
care = ["safe", "peace", "empathy"]
And I have a dataframe containing speeches, that on average consist of 450 words. I have counted the number of matches for each category using this line of code:
df['Care'] = df['Speech'].apply(lambda x: len([val for val in x.split() if val in care]))
Which gives me the total amount of matches for each category.
However i need to review the frequencies of each word in the list. I tried using this code to solve my problem.
df.Tal.str.extractall('({})'.format('|'.join(auktoritet)))\
.iloc[:, 0].str.get_dummies().sum(level=0)
I've tried different methods but the problems is that i always get partial matches included. For example hammer would be counted for ham.
Any ideas on how to solve this?
Upvotes: 0
Views: 89
Reputation: 11
You can use Counter which is available from collections package
from collections import Counter
word_count=Counter()
for line in df['speech']:
for word in line.split(' '):
word_count[word]+=1
it will store count of all words in word_count. Then You can use
word_count.most_common()
to see the words with highest frequency.
Upvotes: 1
Reputation: 5
I build on Akash answer, and managed to get the frequencies of prespecified words stored in a list and then counting them in the dataframe, by simply adding a line.
from collections import Counter
word_count=Counter()
for line in df['Speech']:
for word in line.split(' '):
if word in care:
word_count[word]+=1
word_count.most_common()
Upvotes: 0
Reputation: 1940
You could transform each word in a tuple with 1 as second element ('word', 1)
and sum it for each word in list.
The output will be a list of tuples with the words and the frequencies:
[('word1', 3), ('word2', 10) ... ]
This is the main idea.
Upvotes: 0