Get first occurence of match in Regex

Question

I have the following text:

"cat dog mouse lion"

And I search for "dog" or "mouse" using regex:

Regex regex = new Regex(@"dog|mouse");

The way Regex in C# behaves is that it first searches all the way through for the word dog. If it finds a match, it stops. How do I make it stop after finding the first occurrence of any of my words in the regex, meaning stop after "cat" as this occurs first?

Do I have to make multiple regex searches and match the indexes of the findings? Or is it possible to specify it in the regex expression?

Casimir et Hippolyte · Accepted Answer

A way to do that is to use a lazy quantifier with dotall option:

Regex regex = new Regex(@"^.*?\b(?>dog|mouse)\b");

Another way is to do that;

Regex regex = new Regex(@"^(?>[^dm]*+|d++(?!og\b)|m++(?!ouse\b))*\b(?>dog|mouse)\b");

it is longer but more efficient. The idea is to avoid lazy quantifier that is slow because it tests on each characters to see what follows. Here i describe the begining as "all that is not a d or a m OR some d not followed by og OR some m not followed by ouse zero or more times.

(?>..) is an atomic group, this is to avoid that the regex engine backtrack, it is a kind of 'all or nothing', more informations here

++ is a possessive quantifier that avoid backtracks too.

Get first occurence of match in Regex

Answers (2)

Related Questions