Reputation: 91
I have the following code that can return a line from text where a certain word exists
with open('/Users/Statistical_NLP/Project/text.txt') as f:
haystack = f.read()
with open('/Users/Statistical_NLP/Project/test.txt') as f:
for line in f:
needle = line.strip()
pattern = '^.*{}.*$'.format(re.escape(needle))
for match in re.finditer(pattern, haystack, re.MULTILINE):
print match.group(0)
How can I search for a word and return not the whole line, just the 3 words after and the three words before this certain word.
Something has to be changed in this line in my code:
pattern = '^.*{}.*$'.format(re.escape(needle))
Thanks a lot
Upvotes: 0
Views: 78
Reputation: 2554
The following regex will help you achieve what you want.
((?:\w+\s+){3}YOUR_WORD_HERE(?:\s+\w+){3})
For a better understanding of the regex, I suggest you go to the following page and experiment with it.
https://regex101.com/r/eS8zW5/3
This will match the three words before, the matched word and three words after.
The following will match 3 words before and after if they exist
((?:\w+\s+){0,3}YOUR_WORD_HERE(?:\s+\w+){0,3})
Upvotes: 1