Nate
Nate

Reputation: 414

Consistent start-of-string anchor behavior in .NET Regex Match methods

Regexes in .NET (I’m using 4.5.2) appear to have three (non-static) Match methods:

  1. regex.Match(string input) searches for the first match in input.
  2. regex.Match(string input, int startIndex) searches for the first match in input starting at startIndex.
  3. regex.Match(string input, int startIndex, int length) searches for the first match in a range of input defined by startIndex and length.

If I write

System.Text.RegularExpressions.Regex regex =
    new System.Text.RegularExpressions.Regex("^abc");
string str = "abc abc";

System.Text.RegularExpressions.Match match = regex.Match(str);
System.Diagnostics.Debug.WriteLine(match.Success);

then I see that match.Success is True, as expected. The regex matches the abc at the beginning of str.

If I then write

int index = 4;
match = regex.Match(str, index);
System.Diagnostics.Debug.WriteLine(match.Success);

to search from index 4 to the end of str, then I see that match.Success is False, as expected. There’s an abc at index 4 of str, but index 4 is not the beginning of the string.

However, if I write

match = regex.Match(str, index, str.Length - index);
System.Diagnostics.Debug.WriteLine(match.Success);
System.Diagnostics.Debug.WriteLine(match.Index);

to again search from index 4 to the end of str, then I see that match.Success is unexpectedly True, and match.Index is 4. I would expect to get the same result as calling regex.Match(str, index).

Is there a way to get consistent start-of-string anchor behavior in .NET Regex Match methods?

Upvotes: 3

Views: 144

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626961

From the comments in the Regex.cs source code, I see that public Match Match(String input, int startat) finds the first match, starting at the specified position and public Match Match(String input, int beginning, int length) finds the first match, restricting the search to the specified interval of the char array.

Combined with your test results (and mine), it is clear that the last overload of the Regex.Match method takes the substring as a new, separate string and passes it to the regex engine. No changing ^ to \A will help.

Thus, to know if the match is at the real start or not, you should just add logics to your own code, say, if index is more than 0, all matches are not at the real start of the string. However, the index returned is correct, thus that looks like a bug to me.

Upvotes: 2

Related Questions