Liam
Liam

Reputation: 493

Regex split by same character within brackets

I have a like long string, like so:

(A) name1, name2, name3, name3 (B) name4, name5, name7 (via name7) ..... (AA) name47, name47 (via name 46) (BB) name48, name49

Currently I split by "(" but it picks up the via as new lines)

string[] lines = routesRaw.Split(new[] { "  (" }, StringSplitOptions.RemoveEmptyEntries);

How can I split the information within the first brackets only? There is no AB, AC, AD, etc. the characters are always the same within the brackets.

Thanks.

Upvotes: 0

Views: 48

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627022

You may use a matching approach here since the pattern you need will contain a capturing group in order to be able to match the same char 0 or more amount of times, and Regex.Split outputs all captured substrings together with non-matches.

I suggest

(?s)(.*?)(?:\(([A-Z])\2*\)|\z)

Grab all non-empty Group 1 values. See the regex demo.

Details

  • (?s) - a dotall, RegexOptions.Singleline option that makes . match newlines, too
  • (.*?) - Group 1: any 0 or more chars, but as few as possible
  • (?:\(([A-Z])\2*\)|\z) - a non-capturing group that matches:
    • \(([A-Z])\2*\) - (, then Group 2 capturing any uppercase ASCII letter, then any 0 or more repetitions of this captured letter and then )
    • | - or
    • \z - the very end of the string.

In C#, use

var results = Regex.Matches(text, @"(?s)(.*?)(?:\(([A-Z])\2*\)|\z)")
        .Cast<Match>()
        .Select(x => x.Groups[1].Value)
        .Where(z => !string.IsNullOrEmpty(z))
        .ToList();

See the C# demo online.

Upvotes: 1

Related Questions