Reputation: 3041
Please point me to a post if one already exists for this question.
How might I efficiently add in word boundary syntax to list of strings?
So for instance, I want to make sure the words below in badpositions
only match a word in their entirety so I'd like to use re.search('\bword\b', text)
.
How do I get the words in bad positions to take the form ['\bPresident\b', '\bProvost\b']
etc
text = ['said Duke University President Richard H. Brodhead. "Our faculty look forward']
badpositions = ['President', 'Provost', 'University President', 'Senior Vice President']
Upvotes: 3
Views: 1186
Reputation: 54213
re_badpositions = [r"\b{word}\b".format(word=word) for word in badpositions]
indexes = {badpositions[i]:re.search(re_badpositions[i],text) for i in range(len(badpositions))}
If I understand you correctly, you're looking to find the starting index of all words that match exactly (that is, \bWORD\b
) in your text
string. This is how I'd do that, but I'm certainly adding a step here, you could just as easily do:
indexes = {word: re.search("\b{word}\b".format(word=word),text) for word in badpositions}
I find it a little more intelligible to create a list of regexes to search with, then search by them separately than to plunk those regexes in place at the same time. This is ENTIRELY due to personal preference, though.
Upvotes: 6