Maggie
Maggie

Reputation: 6093

Search list: match only exact word/string

How to match exact string/word while searching a list. I have tried, but its not correct. below I have given the sample list, my code and the test results

list = ['Hi, hello', 'hi mr 12345', 'welcome sir']

my code:

for str in list:
  if s in str:
    print str

test results:

s = "hello" ~ expected output: 'Hi, hello' ~ output I get: 'Hi, hello'
s = "123" ~ expected output: *nothing* ~ output I get: 'hi mr 12345'
s = "12345" ~ expected output: 'hi mr 12345' ~ output I get: 'hi mr 12345'
s = "come" ~ expected output: *nothing* ~ output I get: 'welcome sir'
s = "welcome" ~ expected output: 'welcome sir' ~ output I get: 'welcome sir'
s = "welcome sir" ~ expected output: 'welcome sir' ~ output I get: 'welcome sir'

My list contains more than 200K strings

Upvotes: 2

Views: 16946

Answers (5)

Aamir Rind
Aamir Rind

Reputation: 39659

use regular expression here to match exact word with word boundary \b

 import re
 .....
 for str in list:
 if re.search(r'\b'+wordToLook+'\b', str):
    print str

\b only matches a word which is terminated and starts with word terminator e.g. space or line break

or do something like this to avoid typing the word for searching again and again.

import re
list = ['Hi, hello', 'hi mr 12345', 'welcome sir']
listOfWords = ['hello', 'Mr', '123']
reg = re.compile(r'(?i)\b(?:%s)\b' % '|'.join(listOfWords))
for str in list:
   if reg.search(str):
      print str

(?i) is to search for without worrying about the case of words, if you want to search with case sensitivity then remove it.

Upvotes: 0

Vader
Vader

Reputation: 3883

>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> search = lambda word: filter(lambda x: word in x.split(),l)
>>> search('123')
[]
>>> search('12345')
['hi mr 12345']
>>> search('hello')
['Hi, hello']

Upvotes: 1

Sven Marnach
Sven Marnach

Reputation: 601679

Provided s only ever consists of just a few words, you could do

s = s.split()
n = len(s)
for x in my_list:
    words = x.split()
    if s in (words[i:i+n] for i in range(len(words) - n + 1)):
        print x

If s consists of many words, there are more efficient, but also much more complex algorithm for this.

Upvotes: 0

fransua
fransua

Reputation: 1608

if you search for exact match:

for str in list:
  if set (s.split()) & set(str.split()):
    print str

Upvotes: 0

Roman Bodnarchuk
Roman Bodnarchuk

Reputation: 29727

It looks like you need to perform this search not only once so I would recommend to convert your list into dictionary:

>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> d = dict()
>>> for item in l:
...     for word in item.split():
...             d.setdefault(word, list()).append(item)
...

So now you can easily do:

>>> d.get('hi')
['hi mr 12345']
>>> d.get('come')    # nothing
>>> d.get('welcome')
['welcome sir']

p.s. probably you have to improve item.split() to handle commas, point and other separators. maybe use regex and \w.

p.p.s. as cularion mentioned this won't match "welcome sir". if you want to match whole string, it is just one additional line to proposed solution. but if you have to match part of string bounded by spaces and punctuation regex should be your choice.

Upvotes: 1

Related Questions