Vinay Sathyanarayana
Vinay Sathyanarayana

Reputation: 460

Common regex for multiple data format strings

I have the data in the below format. The strings can be in any order, but a max of two entries on a line. It can also have only one entry in a line. I'm trying hard from past two days to write a regex for the below conditions.

If I split the string with the number of spaces, the value is getting split. If I split the string with ) if the value comes first before the occurrence of ) as in line 1 or two, the string will not be split. Any advice?

I defined the below regex that captures different part of the string.

\(([^)]+)\)
\(.+\)

However, I'm unable to build a regex that matches the data as below.

Note: Out of the 7 lines shown below, each line is an input string and not everything as a whole.

VALUE1                                PARAMETER(VALUE2)
VALUE3                                PARAMETER(VALUE4 WITH     SPACES)
PARAMETER(VALUE5)                     VALUE6
PARAMETER(VALUE7 WITH     SPACES)     VALUE8
PARAMETER(VALUE9 WITH     SPACES)     PARAMETER(VALUE10)
VALUE11                               VALUE12   
PARAMETER(VALUE13 WITH                                      SPACES)

to be captured as

VALUE1
PARAMETER(VALUE2)
VALUE3
PARAMETER(VALUE4 WITH     SPACES)
PARAMETER(VALUE5)
VALUE6
PARAMETER(VALUE7 WITH     SPACES)
VALUE8
PARAMETER(VALUE9 WITH     SPACES)
PARAMETER(VALUE10)
VALUE11
VALUE12
PARAMETER(VALUE13 WITH                                      SPACES)

Upvotes: 4

Views: 92

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

You need to use variable-width look-arounds to check if the multiple spaces are not inside parentheses:

(?<!\([^)]*)\s+(?![^(]*\))

See RegexStorm Demo

Regex Explanation:

  1. (?<!\([^)]*) - A negative look-behind that checks if the whitespace is NOT preceded with an opening ( and optional number of characters other than ) (i.e. the whitespace is not after the ()
  2. \s+ - Whitespace that will be consumed and left out of the final array after split mind that you can restrict it to only spaces with \p{Zs} shorthand Unicode class if you want to exclude tabs and other whitspace symbols that \smatches)
  3. (?![^(]*\)) - A negative look-ahead making sure there is no optional number of characters other than ( and then ) after the whitespace (i.e. there is no ) after).

Points 1 and 3 make sure we are checking both sides of whitespace for parentheses.

You can use this regex with Regex.Split().

var rx = new Regex(@"(?<!\([^)]*)\s+(?![^(]*\))");
var txt = @"YOUR TEXT";
var reslt = rx.Split(txt);

enter image description here

Upvotes: 2

samgak
samgak

Reputation: 24417

Try this regex:

(\S+\([^\)]+\)|\S+(?!\())

Demo

\S+\([^\)]+\) matches non-whitespace, then an open bracket, then anything except a close bracket, then a close bracket.

\S+(?!\()) match non-whitespace with negative lookahead for an open bracket.

Upvotes: 1

Related Questions