b24
b24

Reputation: 2473

best way count the number of matches between the list and the string in python

What is the best way to count the number of matches between the list and the string in python??

for example if I have this list:

list = ['one', 'two', 'three']

and this string:

line = "some one long. two phrase three and one again"

I want to get 4 because I have

one 2 times
two 1 time
three 1 time

I try below code based on this question answers and it's worked but I got error if I add many many words (4000 words) to list:

import re
word_list = ['one', 'two', 'three']
line = "some one long. two phrase three and one again"
words_re = re.compile("|".join(word_list))
print(len(words_re.findall(line)))

This is my error:

words_re = re.compile("|".join(word_list))
  File "/usr/lib/python2.7/re.py", line 190, in compile

Upvotes: 2

Views: 365

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180411

If you want case insensitive and to match whole words ignoring punctuation, split the string and strip the punctuation using a dict to store the words you want to count:

lst = ['one', 'two', 'three']
from string import punctuation
cn = dict.fromkeys(lst, 0)
line = "some one long. two phrase three and one again"

for word in line.lower().split():
    word = word.strip(punctuation)
    if word in cn:
        cn[word] += 1


print(cn)

{'three': 1, 'two': 1, 'one': 2}

If you just want the sum use a set with the same logic:

from string import punctuation

st = {'one', 'two', 'three'}
line = "some one long. two phrase three and one again"

print(sum(word.strip(punctuation) in st for word in line.lower().split()))

This does a single pass over the the words after they are split, the set lookup is 0(1) so it is substantially more efficient than list.count.

Upvotes: 1

Related Questions