Reputation: 355
I have a list of strings. I want to count the occurrence of all the words in each row of a Pandas column and add a new column with this count.
words = ["I", "want", "please"]
data = pd.DataFrame({"col" : ["I want to find", "the fastest way", "to
count occurrence", "of words in a column", "Can you help please"]})
data["Count"] = data.col.str.count("|".join(words))
print(data)
The code shown here does exactly what I want, but it's taking a long time to run for a long text and long list of words. Can you suggest a faster way to do the same thing ?
Thanks
Upvotes: 1
Views: 1185
Reputation: 109546
Perhaps you can use Counter
. If you have multiple sets of words
to test against the same text, just save the intermediate step after applying Counter
. As these counted words are now in a dictionary keyed on the word, it is an O(1) operation to test if this dictionary contains a given word.
from collections import Counter
data["Count"] = (
data['col'].str.split()
.apply(Counter)
.apply(lambda counts: sum(word in counts for word in words))
)
>>> data
col Count
0 I want to find 2
1 the fastest way 0
2 to count occurrence 0
3 of words in a column 0
4 Can you help please 1
Upvotes: 3