Dan M
Dan M

Reputation: 1272

regex: split by parentheses ignore nested parentheses inside quotes

My program parses SQL VALUES multi-row string into single-row string array.

Typical input string looks like:

(11,'-1','Service A (nested parentheses)','en') (22,'-2','Service B (nested parentheses)','en')

Desired output:

I have tried following regexp, with partial luck only:

\(('.*?'|.*?)\)

What would be the right way to handle this in regexp?

EDIT:

Upvotes: 5

Views: 6267

Answers (3)

zx81
zx81

Reputation: 41838

EDIT: After your comment about smilies, I'll suggest an alternative approach:

(?<=\()(?:'[^']*'|[,\s]+|\d+)+(?=\))

See demo. This assumes that your tokens are either strings delimited by single quotes, or digits. Is that correct?

Original Answer

With one potential level of nesting, this will work in most regex flavors, including Java:

(?<=\()(?:[^()]+|\([^)]+\))+

See demo

How does it work?

  1. The lookbehind asserts that the previous character is an opening parenthesis (
  2. The non-capturing group with the + quantifier matches one or more of: (i) any number of characters that are not opening or closing parentheses, OR | (ii) full (parenthesized expressions)

If you want to make sure that the container is balanced, add a lookahead at the end:

(?<=\()(?:[^()]+|\([^)]+\))+(?=\))

Upvotes: 2

adamdc78
adamdc78

Reputation: 1161

pattern.compile("\\(((?:'[^']*'|[^'\\(\\)]+)+)\\)");

RegexPlanet click the Java link.

The meat of the regex is '[^']*'|[^'\(\)] - any series of any characters surrounded by single quotations OR any string of characters excluding single quotes and round brackets. This avoids having to use look arounds, although the look around suggested by Casimir et Hippolyte may in fact be more efficient (I am not particularly familiar with the performance aspect of look arounds in Java).

Upvotes: 1

Kache
Kache

Reputation: 16687

With caveats:

/\(.*\)/\1/

Will remove the surrounding parenthesis, and

/\) \(/\r/g

Will put in newlines as in your example

Caveats:

  • This regex is in a generalized form, since you didn't specify which regex implementation
  • This only works if the input closely matches your example

Upvotes: 0

Related Questions