Reputation: 2473
What is the best way to count the number of matches between the list and the string in python??
for example if I have this list:
list = ['one', 'two', 'three']
and this string:
line = "some one long. two phrase three and one again"
I want to get 4 because I have
one 2 times
two 1 time
three 1 time
I try below code based on this question answers and it's worked but I got error if I add many many words (4000 words) to list:
import re
word_list = ['one', 'two', 'three']
line = "some one long. two phrase three and one again"
words_re = re.compile("|".join(word_list))
print(len(words_re.findall(line)))
This is my error:
words_re = re.compile("|".join(word_list))
File "/usr/lib/python2.7/re.py", line 190, in compile
Upvotes: 2
Views: 365
Reputation: 180411
If you want case insensitive and to match whole words ignoring punctuation, split the string and strip the punctuation using a dict to store the words you want to count:
lst = ['one', 'two', 'three']
from string import punctuation
cn = dict.fromkeys(lst, 0)
line = "some one long. two phrase three and one again"
for word in line.lower().split():
word = word.strip(punctuation)
if word in cn:
cn[word] += 1
print(cn)
{'three': 1, 'two': 1, 'one': 2}
If you just want the sum use a set with the same logic:
from string import punctuation
st = {'one', 'two', 'three'}
line = "some one long. two phrase three and one again"
print(sum(word.strip(punctuation) in st for word in line.lower().split()))
This does a single pass over the the words after they are split, the set lookup is 0(1)
so it is substantially more efficient than list.count
.
Upvotes: 1