Geomariachi
Geomariachi

Reputation: 5

Count word frequencies of each word in a list in dataframe

I have created a list of words associated with a certain category. For example:

care = ["safe", "peace", "empathy"]

And I have a dataframe containing speeches, that on average consist of 450 words. I have counted the number of matches for each category using this line of code:

df['Care'] = df['Speech'].apply(lambda x: len([val for val in x.split() if val in care]))

Which gives me the total amount of matches for each category.

However i need to review the frequencies of each word in the list. I tried using this code to solve my problem.

df.Tal.str.extractall('({})'.format('|'.join(auktoritet)))\
                           .iloc[:, 0].str.get_dummies().sum(level=0)

I've tried different methods but the problems is that i always get partial matches included. For example hammer would be counted for ham.

Any ideas on how to solve this?

Upvotes: 0

Views: 89

Answers (3)

Akash Baranwal
Akash Baranwal

Reputation: 11

You can use Counter which is available from collections package

from collections import Counter
word_count=Counter()
for line in df['speech']:
   for word in line.split(' '):
      word_count[word]+=1

it will store count of all words in word_count. Then You can use

word_count.most_common()

to see the words with highest frequency.

Upvotes: 1

Geomariachi
Geomariachi

Reputation: 5

I build on Akash answer, and managed to get the frequencies of prespecified words stored in a list and then counting them in the dataframe, by simply adding a line.

from collections import Counter

word_count=Counter()
for line in df['Speech']:
   for word in line.split(' '):
       if word in care:
           word_count[word]+=1

word_count.most_common()

Upvotes: 0

Henrique Branco
Henrique Branco

Reputation: 1940

You could transform each word in a tuple with 1 as second element ('word', 1) and sum it for each word in list.

The output will be a list of tuples with the words and the frequencies:

[('word1', 3), ('word2', 10) ... ]

This is the main idea.

Upvotes: 0

Related Questions