ALS_WV
ALS_WV

Reputation: 91

matching word boundaries in RegEx python 2.7

I have the following code that can return a line from text where a certain word exists

with open('/Users/Statistical_NLP/Project/text.txt') as f:
    haystack = f.read()

with open('/Users/Statistical_NLP/Project/test.txt') as f:
    for line in f:
        needle = line.strip()
        pattern = '^.*{}.*$'.format(re.escape(needle))
        for match in re.finditer(pattern, haystack, re.MULTILINE):
            print match.group(0)

How can I search for a word and return not the whole line, just the 3 words after and the three words before this certain word.

Something has to be changed in this line in my code:

pattern = '^.*{}.*$'.format(re.escape(needle))

Thanks a lot

Upvotes: 0

Views: 78

Answers (1)

Harsh Poddar
Harsh Poddar

Reputation: 2554

The following regex will help you achieve what you want.

((?:\w+\s+){3}YOUR_WORD_HERE(?:\s+\w+){3})

For a better understanding of the regex, I suggest you go to the following page and experiment with it.

https://regex101.com/r/eS8zW5/3

This will match the three words before, the matched word and three words after.

The following will match 3 words before and after if they exist

((?:\w+\s+){0,3}YOUR_WORD_HERE(?:\s+\w+){0,3})

Upvotes: 1

Related Questions