Jodrell
Jodrell

Reputation: 35746

Limit Regex to word boundaries

I have some text

"Lorem ipsum dolor sit amet, consectetuer adipiscing elit."

And I have a Regex, that is generated from user input.

@".*ip.*"

This matches the whole line, as you would expect, so I wrap this expression with word boundaries.

@"\b.*ip.*\b"

Because the processor is greedy, this still matches the whole text. So, I've tried making the repetition lazy.

@"\b.*?ip.*?\b"

This is better but matches

  1. Lorem ipsum
  2. dolor sit amet, consectetuer adipiscing

how can I extend the orginal @".*ip.*" pattern so that it lazily matches whole words and captures?

  1. ipsum
  2. adipiscing

This regex tester maybe useful for answering the question

Upvotes: 0

Views: 147

Answers (3)

Rohit Jain
Rohit Jain

Reputation: 213391

Why not just use \w* instead of .*?:

@"\w*ip\w*"

This will also match _ and 0-9 as it is included in \w. If you want to exclude it, you can use [a-zA-Z]* explicitly instead of \w there.

Upvotes: 5

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236328

I think some words can contain hyphen, so it's better to use pattern [\w-]*ip[\w-]*

Upvotes: 1

Guido
Guido

Reputation: 958

You were already close to the solution. Just replace the dot (any char) by the non-whitespace escape sequence \S:

@"\b\S*?ip\S*?\b"

Upvotes: 1

Related Questions