Kristian Rafteseth
Kristian Rafteseth

Reputation: 2032

REGEX: How to get the nearest words around the match?

Is it possible to make an expression that matches lets say 'FINDTHISWORD' + up to 5 words before and after this word? The thing is though that there might be just 0 or 1 word before or after, so it should match 0-5 words + FINDTHISWORD then 0-5 words.

Examples it should match:

fdoijfd iudfhiufdh fdhui FINDTHISWORD iduhdfd 
FINDTHISWORD iduhdfd oijfdfd 
doijd FINDTHISWORD

Upvotes: 3

Views: 1408

Answers (3)

Mark Setchell
Mark Setchell

Reputation: 207445

Maybe less elegant than long regexes and quantifiers, but pretty simple to read and understand:

grep FINDTHIS file | while read X; do xargs -n1 <<<$X | grep -C5 FINDTHIS|xargs;done

Basically, it finds lines containing FINDTHIS and then reads them one at a time into a loop. In the loop, the words in the line are each put on their own line and then I just use regular grep with a context of 5 (-C5) to get 5 words either side before re-assembling the lines.

Upvotes: 0

Toto
Toto

Reputation: 91385

I'd do:

((?:\p{Xwd}+\P{Xwd}){0,5})\bFINDTHISWORD\b((?:\P{Xwd}\p{Xwd}+){0,5})

where

\p{Xwd} means any word character unicode compatible
\P{Xwd} is the oposite of \p{Xwd}

The words before will be in group 1 and the words after in group 2.

Upvotes: 0

devnull
devnull

Reputation: 123458

You could make use of quantifiers:

(\w+ ){0,5}FINDTHISWORD( \w+){0,5}

Upvotes: 15

Related Questions