Pipeline
Pipeline

Reputation: 1059

Regex.Matches is skipping over a match? c#

I need to identify substrings found in a string such as:

"CityABCProcess Test" or "cityABCProcess Test"

to yield :

[ "City/city", "ABC", "Process", "Test" ]

  1. The first string in the substring can be lowercase or uppercase
  2. Any substring with recurring uppercase letters will be a substring until a lowercase letter or space is found "ABCProcess -> ABC, ABC Process -> ABC"
  3. If there is an uppercase letter followed by a lowercase letter the substring will be everything until the next uppercase letter.

The regular expression we have been using is:

"[A-Z][a-z]+|([A-Z]|[0-9])+\b|[A-Z]+(?=[A-Z])|([a-z]|[0-9])+"

This has been working great but breaks in the case of a string:

"X-999"

We are implementing it in this fashion:

        StringBuilder builder = new StringBuilder();
        builder.Append("[A-Z][a-z]+|([A-Z]|[0-9])+\b|[A-Z]+(?=[A-Z])|([a-z]|[0-9])+");

        foreach (Match match in Regex.Matches(name, builder.ToString()))
        {
            //do things with each match
        }

The problem here is it is not matching on the 'X' but only the '999'. Any ideas? I tested it with regexr.com and it says this regex should match on both substrings.

Upvotes: 3

Views: 187

Answers (1)

Wiseguy
Wiseguy

Reputation: 20873

\b is being interpreted as an escape sequence (\u0008, backspace) in the C# string.

Escape the slash (i.e., \\b), or use a verbatim string using the @ symbol:

        builder.Append(@"[A-Z][a-z]+|([A-Z]|[0-9])+\b|[A-Z]+(?=[A-Z])|([a-z]|[0-9])+");

Upvotes: 4

Related Questions