Reputation: 375
I have something like:
import re
text = 'hi this is john my name is john im bad boy'
target = 'is john'
target = target.replace(' ', '[\s\n]*')
target = re.compile(r'\b%s' % target, flags=re.I | re.X)
indices = [m.start() for m in re.finditer(target, text)]
And I want to then find the word after and before each occurrence in indices (ie. 'this', 'my' and 'name', 'im'). However, I want to avoid using regex to find the words outright because it is too slow when searching bigger files and if I want to find n > 1 word on each side of each occurrence of target. So I have the indices, and I want to get the words before and after the word at the index.
Upvotes: 1
Views: 1166
Reputation: 77837
split
the string at your search phrase. Then take the "boundary" words from the resulting sentence fragments:
frag_list = text.split(target)
for frag in range(len(frag_list)-1):
before = frag_list[frag ].split()[-1] # Last word of left fragment
after = frag_list[frag+1].split()[0 ] # First word of right fragment
# Do what you need to with the two words.
Does that help?
Upvotes: 2