Learning
Learning

Reputation: 20001

C# Regex to match the keyword in TEXT and get few words around the match

I need to match the text and get words around the match.

For example my text is in HTML format and i will use below as a sample

<p>Do not forget the error handling, I don't exactly know what happens if it wants to replace an occurence and can't find it</p>
<p>Edit: If you have multiple entries which should be replaced, loop the replace part until it will not be able to replace anymore then it will throw an error you can catch to continue</p>

MATCH CASE:

Case 1(if match word in in between): occurence

RESULT : I don't exactly know what happens if it wants to replace an occurence and can't find it

Case 2(if match word in first word): Do not

RESULT : Do not forget the error handling, I don't exactly know what happens if it wants to replace an occurence and can't find it

Case 3(if match word in last word in the text): to continue

RESULT : If you have multiple entries which should be replaced, loop the replace part until it will not be able to replace anymore then it will throw an error you can catch to continue

If it is word in between text the it should get text around the word. If match word is first word then it should get the text from the first word itself

If match is last word the it from get the text before the matched last word.

REGEX (?<=(\w+)\s)?(continue)(?=\s(\w+))?

It match's the word only how can i get let us say 10 -15 words around the matched keyword.

Is this possible using Regex

Upvotes: 2

Views: 940

Answers (1)

Tim007
Tim007

Reputation: 2557

Case 1:

([\w\s']+(?:occurence)[^<]+)|>((?:occurence)[^<]+)|[^>]+(?:occurence)<

Regex Demo

Output:

I don't exactly know what happens if it wants to replace an occurence and can't find it

Case 2:

([\w\s']+(?:Do not)[^<]+)|>((?:Do not)[^<]+)|[^>]+(?:Do not)<

[Regex Demo]

Output:

Do not forget the error handling, I don't exactly know what happens if it wants to replace an occurence and can't find it

Case 3:

([\w\s']+(?:to continue)[^<]+)|>((?:to continue)[^<]+)|[^>]+(?:to continue)<

Regex Demo

Output:

Edit: If you have multiple entries which should be replaced, loop the replace part until it will not be able to replace anymore then it will throw an error you can catch to continue

Limit words:

Case 1:

>(Do not(?:\s(?:[\w']+),?){0,100})|((?:\s(?:[\w']+)){0,100}\s?Do not(?:\s(?:[\w']+),?){0,100})|((?:\s(?:[\w',]+)){0,100}\s?Do not)<

Regex Demo

Case 2:

>(occurence(?:\s(?:[\w']+),?){0,100})|((?:\s(?:[\w']+)){0,100}\s?occurence(?:\s(?:[\w']+),?){0,100})|((?:\s(?:[\w',]+)){0,100}\s?occurence)<

Regex Demo

Case 3:

>(continue(?:\s(?:[\w']+),?){0,100})|((?:\s(?:[\w']+)){0,100}\s?continue(?:\s(?:[\w']+),?){0,100})|((?:\s(?:[\w',]+)){0,100}\s?continue)<

Regex Demo

Upvotes: 2

Related Questions