Add in word boundary syntax to list of strings

Question

Please point me to a post if one already exists for this question.

How might I efficiently add in word boundary syntax to list of strings?

So for instance, I want to make sure the words below in badpositions only match a word in their entirety so I'd like to use re.search('\bword\b', text).

How do I get the words in bad positions to take the form ['\bPresident\b', '\bProvost\b'] etc

text = ['said Duke University President Richard H. Brodhead. "Our faculty look forward']
badpositions = ['President', 'Provost', 'University President', 'Senior Vice President']

Adam Smith · Accepted Answer

re_badpositions = [r"\b{word}\b".format(word=word) for word in badpositions]

indexes = {badpositions[i]:re.search(re_badpositions[i],text) for i in range(len(badpositions))}

If I understand you correctly, you're looking to find the starting index of all words that match exactly (that is, \bWORD\b) in your text string. This is how I'd do that, but I'm certainly adding a step here, you could just as easily do:

indexes = {word: re.search("\b{word}\b".format(word=word),text) for word in badpositions}

I find it a little more intelligible to create a list of regexes to search with, then search by them separately than to plunk those regexes in place at the same time. This is ENTIRELY due to personal preference, though.

Add in word boundary syntax to list of strings

Answers (1)

Related Questions