Daniel Jeney
Daniel Jeney

Reputation: 506

Java regex split on whitespace /not preceded

I would like to split a string:"x= 2-3 y=3 z= this, that" I would split this up on one or more whitespaces, that are not preceded by a '=' or a ',' meaning group one: "x= 2-3" two: "y=3" three: "z= this, that" I have an expression that kinda does it but its only good if = or , has only one whitespace after it.

(?<![,=])\\s+ 

Upvotes: 5

Views: 385

Answers (4)

The fourth bird
The fourth bird

Reputation: 163297

If you want to use the negative lookahead, you could assert what is on the left is a pattern which would match for example x= 2-3 and match the following whitespace chars.

Use a negated character class [^\\h=,] to match any char except what it listed.

(?<=[^\\h=,]=\\h{0,100}[^\\h=,]{1,100})\\h+

Regex demo | Java demo

The regex demo has a different engine selected only to show the matches.

In Java you have to use double escapes and you could use \h to match 1+ horizontal whitespace chars in stead of \s

Java does not support infinite width in a lookbehind, but does support a finite width.

For example

String s = "x=   2-3   y=3 z=   this,   that";
String regex = "(?<=[^\\h=,]=\\h{0,100}[^\\h=,]{1,100})\\h+";
String parts[] = s.split(regex);

for (String part : parts)
    System.out.println(part);

Output

x=   2-3
y=3
z=   this,   that

Upvotes: 0

JvdV
JvdV

Reputation: 75840

Thinking the other way around (looking forward instead of backwards), would the following do the job for you?

\\s+(?=\\S*=)
  • \\s+ - one or more whitespace characters
  • (?=\\S*=) - positive lookahead to make sure it's followed by as many non-whitespace characters and a literal equal sign.

Upvotes: 2

ernest_k
ernest_k

Reputation: 45319

This one splits on white space followed by some non-white space then =: "\\s+(?=[^=\\s]+=)":

jshell> "x=   2-3   y=3 z=   this,   that".split("\\s+(?=[^=\\s]+=)")
$10 ==> String[3] { "x=   2-3", "y=3", "z=   this,   that" }

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521194

It might be difficult to phrase clean regex splitting logic here. Instead, I would use a formal pattern matcher here, with the regex pattern:

[^=\s]+\s*=.*?(?=[^=\s]+\s*=|$)

Sample script:

String input = "x=   2-3   y=3 z=   this,   that";
String pattern = "[^=\\s]+\\s*=.*?(?=[^=\\s]+\\s*=|$)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
while (m.find()) {
    System.out.println("match: " + m.group(0));
}

This prints:

match: x=   2-3   
match: y=3 
match: z=   this,   that

Here is an explanation of the regex pattern:

[^=\s]+           match a variable
\s*               followed by optional whitespace
=                 match =
.*?               consume everything, until seeing the nearest
(?=
    [^=\s]+\s*=   the next variable followed by =
    |             or
    $             or the end of the input (covers the z= case)
)

Upvotes: 0

Related Questions