How to count the word occurence (from words in specific list) and store the results in a new column in a Pandas Dataframe in Python?

Question

I currently have a list of words about MMA.

I want to create a new column in my Pandas Dataframe called 'MMA Related Word Count'. I want to analyze the column 'Speech' for each row and sum up how often words (from the list under here) occurred within the speech. Does anyone know the best way to do this? I'd love to hear it, thanks in advance!

Please take a look at my dataframe.

CODE EXAMPLE:

import pandas as pd

mma_related_words = ['mma', 'ju jitsu', 'boxing']

data = {
  "Name": ['Dana White', 'Triple H'],
  "Speech": ['mma is a fantastic sport. ju jitsu makes you better as a person.', 'Boxing sucks. Professional dancing is much better.']
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df)

CURRENT DATAFRAME:

Name	Speech
Dana White	mma is a fantastic sport. ju jitsu makes you better as a person.
Triple H	boxing sucks. Professional wrestling is much better.

--

EXPECTED OUTPUT: Exactly same as above. But at right side new column with 'MMA Related Word Count'. For Dana White: value 2. For Triple H I want value 1.

mozway · Accepted Answer

You can use a regex with str.count:

import re
regex = '|'.join(map(re.escape, mma_related_words))
# 'mma|ju\ jitsu|boxing'

df['Word Count'] = df['Speech'].str.count(regex, flags=re.I)
# or
# df['Word Count'] = df['Speech'].str.count(r'(?i)'+regex)

output:

         Name                                             Speech  Word Count
0  Dana White  mma is a fantastic sport. ju jitsu makes you b...           2
1    Triple H  Boxing sucks. Professional dancing is much bet...           1

How to count the word occurence (from words in specific list) and store the results in a new column in a Pandas Dataframe in Python?

Answers (2)

Related Questions