Y.P
Y.P

Reputation: 355

Fastest way to count occurrence of words in Pandas

I have a list of strings. I want to count the occurrence of all the words in each row of a Pandas column and add a new column with this count.

words = ["I", "want", "please"]
data = pd.DataFrame({"col" : ["I want to find", "the fastest way", "to 
                              count occurrence", "of words in a column", "Can you help please"]})
data["Count"] = data.col.str.count("|".join(words))
print(data)

The code shown here does exactly what I want, but it's taking a long time to run for a long text and long list of words. Can you suggest a faster way to do the same thing ?

Thanks

Upvotes: 1

Views: 1185

Answers (1)

Alexander
Alexander

Reputation: 109546

Perhaps you can use Counter. If you have multiple sets of words to test against the same text, just save the intermediate step after applying Counter. As these counted words are now in a dictionary keyed on the word, it is an O(1) operation to test if this dictionary contains a given word.

from collections import Counter

data["Count"] = (
    data['col'].str.split()
    .apply(Counter)
    .apply(lambda counts: sum(word in counts for word in words))
)
>>> data
                    col  Count
0        I want to find      2
1       the fastest way      0
2   to count occurrence      0
3  of words in a column      0
4   Can you help please      1

Upvotes: 3

Related Questions