cryptojesus
cryptojesus

Reputation: 375

How to get the whole word starting at an index in a string in Python

I have something like:

import re
text = 'hi this is john my name is john im bad boy'
target = 'is john'
target = target.replace(' ', '[\s\n]*')
target = re.compile(r'\b%s' % target, flags=re.I | re.X)
indices = [m.start() for m in re.finditer(target, text)]

And I want to then find the word after and before each occurrence in indices (ie. 'this', 'my' and 'name', 'im'). However, I want to avoid using regex to find the words outright because it is too slow when searching bigger files and if I want to find n > 1 word on each side of each occurrence of target. So I have the indices, and I want to get the words before and after the word at the index.

Upvotes: 1

Views: 1166

Answers (1)

Prune
Prune

Reputation: 77837

split the string at your search phrase. Then take the "boundary" words from the resulting sentence fragments:

frag_list = text.split(target)
for frag in range(len(frag_list)-1):
    before = frag_list[frag  ].split()[-1]   # Last  word of left  fragment
    after  = frag_list[frag+1].split()[0 ]   # First word of right fragment
    # Do what you need to with the two words.

Does that help?

Upvotes: 2

Related Questions