Reputation: 695
I know the following function will simply return a count of a word in a sentence. I'm trying to loop through a df['col']
using a list of words. I currently have my function set up to loop over each row. But I'd like to know if there's a way to avoid using a nested loop.
def count_occurrences(word, sentence):
return sentence.lower().split().count(word)
For example, say my column is as below. I want to count how many times the words in the following list occurs in each sentence e.g.
list_of_words = ['green','apple']
ind | df['col']
------------------------------------------
1. | 'green apple red apple blue apple'
2. | 'green apple green apple green apple'
Output:
[4,6]
Upvotes: 2
Views: 182
Reputation: 323226
It will be better adding /b
in case you have pineapple
df['cnt'] = df['col'].str.count(r'/b|/b'.join(list_of_words))
Upvotes: 0
Reputation: 11171
try this:
df_counts = df["col"].str.split().explode().value_counts()
df_counts[list_of_words]
output:
green 4
apple 6
Name: col, dtype: int64
This may be a more useful output form than a list of counts. Or, if you just want the list:
df_counts[list_of_words].to_list()
Upvotes: 0
Reputation: 1979
You can try with pandas.Series.str.count
:
>>> df
col
0 green apple red apple blue apple
1 green apple green apple green apple
>>>
>>> df.col.str.count('|'.join(list_of_words)).tolist()
[4, 6]
Upvotes: 2