Reputation: 63
I was wondering how I can count the number of unique words that I have in a list from a specific data frame.
For example, let's say I have a list = ['John','Bob,'Hannah']
Next, I have a data frame with a column called sentences
df =
['sentences']
0 Bob went to the shop
1 John visited Hannah
2 Hannah ate a burger
I want the output to be:
John 1
Bob 1
Hannah 2
How can I count the unique names in any given sentence in any row in a dataset?
Upvotes: 2
Views: 1205
Reputation: 323226
In your case
list1 = ['John','Bob','Hannah']
df.Col1.str.findall('|'.join(list1)).explode().value_counts()
Hannah 2
Bob 1
John 1
Name: Col1, dtype: int64
Upvotes: 2
Reputation: 25239
You also may use str.split
, explode
, and value_counts
l = ['John', 'Bob', 'Hannah']
df.sentences.str.split().explode().value_counts()[l]
Out[239]:
John 1
Bob 1
Hannah 2
Name: sentences, dtype: int64
However, I think dict comprehension
is faster.
Upvotes: 3
Reputation: 18416
You can use Series.str.contains
and call the sum
to get the number of occurances of a word in the given column, just iterate over the list for all the substrings and do the same for each word, store the result as dictionary.
list1 = ['John','Bob','Hannah']
output = {}
for word in list1:
output[word] = df['sentences'].str.contains(word).sum()
OUTPUT:
{'John': 1, 'Bob': 1, 'Hannah': 2}
You can even use it in a dictionary comprehension:
>>> {word: df['sentences'].str.contains(word).sum() for word in list1}
{'John': 1, 'Bob': 1, 'Hannah': 2}
PS: If a word/substring is present multiple time in the same row of the given column, the above method will count those multiple occurrences as 1, if you want to get multiple counts in that case, you can implement the same logic for each cell value
Upvotes: 4