Reputation: 63

How to count specific words in a list from a panda dataframe?

I was wondering how I can count the number of unique words that I have in a list from a specific data frame. For example, let's say I have a list = ['John','Bob,'Hannah'] Next, I have a data frame with a column called sentences

df = 

       ['sentences']
 
0 Bob went to the shop

1 John visited Hannah

2 Hannah ate a burger

I want the output to be:

John 1

Bob 1

Hannah 2

How can I count the unique names in any given sentence in any row in a dataset?

Upvotes: 2

Answers (3)

BENY

Reputation: 323226

In your case

list1 = ['John','Bob','Hannah']
df.Col1.str.findall('|'.join(list1)).explode().value_counts()
Hannah    2
Bob       1
John      1
Name: Col1, dtype: int64

Upvotes: 2

Andy L.

Reputation: 25239

You also may use str.split, explode, and value_counts

l = ['John', 'Bob', 'Hannah']
df.sentences.str.split().explode().value_counts()[l]

Out[239]:
John      1
Bob       1
Hannah    2
Name: sentences, dtype: int64

However, I think dict comprehension is faster.

Upvotes: 3

ThePyGuy

Reputation: 18416

You can use Series.str.contains and call the sum to get the number of occurances of a word in the given column, just iterate over the list for all the substrings and do the same for each word, store the result as dictionary.

list1 = ['John','Bob','Hannah']
output = {}
for word in list1:
    output[word] = df['sentences'].str.contains(word).sum()

OUTPUT:

{'John': 1, 'Bob': 1, 'Hannah': 2}

You can even use it in a dictionary comprehension:

>>> {word: df['sentences'].str.contains(word).sum() for word in list1}
{'John': 1, 'Bob': 1, 'Hannah': 2}

PS: If a word/substring is present multiple time in the same row of the given column, the above method will count those multiple occurrences as 1, if you want to get multiple counts in that case, you can implement the same logic for each cell value

Upvotes: 4

How to count specific words in a list from a panda dataframe?

Answers (3)

Related Questions