Baj B
Baj B

Reputation: 65

Split a string by a substring except for brackets

How can we split following by "and".

field = "a > b and b = 0 and (f = 1 and g = 2)"

Doing, field.Split(" and ") will return 4 strings, where we will have brackets inside them

a > b
b = 0
(f = 1 
g = 2)

I just want 3 strings, splitting by outer "and" :

a > b
b = 0
(f = 1 and g = 2)

Tried various Regex options as well, but no luck.

Upvotes: 4

Views: 498

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627022

Even if you have nested balanced parentheses, you can use

\s*\band\b\s* # whole word and enclosed with 0+ whitespaces
(?=           # start of a positive lookahead:   
  (?: 
    [^()]*    # 0 or more chars other than ( and )
    \((?>[^()]+|(?<o>\()|(?<-o>\)))*(?(o)(?!))\)  # a (...) substring with nested parens support
  )*          # repeat the sequence of above two patterns 0 or more times
  [^()]*$     # 0 or more chars other than ( and ) and end of string  
)             # end of the positive lookahead

See the regex demo.

See a C# snippet:

var text = "a > b and b = 0 and (f = 1 and (g = 2 and j = 68) and v = 566) and a > b and b = 0 and (f = 1 and g = 2)";
var pattern = @"(?x)
        var pattern = @"(?x)
\s*\band\b\s* # whole word and enclosed with 0+ whitespaces
(?=           # start of a positive lookahead:   
  (?: 
    [^()]*    # 0 or more chars other than ( and )
    \((?>[^()]+|(?<o>\()|(?<-o>\)))*(?(o)(?!))\)  # a (...) substring with nested parens support
  )*          # repeat the sequence of above two patterns 0 or more times
  [^()]*$     # 0 or more chars other than ( and ) and end of string  
)             # end of the positive lookahead";
var results = Regex.Split(text, pattern);

Output:

a > b
b = 0
(f = 1 and (g = 2 and j = 68) and v = 566)
a > b
b = 0
(f = 1 and g = 2)

Upvotes: 5

Related Questions