tom91136
tom91136

Reputation: 8962

Java String.split() splitting every character instead of given regular expression

I have a string that I want to split into an array:

SEQUENCE: 1A→2B→3C

I tried the following regular expression:

((.*\s)|([\x{2192}]*))

1. \x{2192} is the arrow mark
2. There is a space after the colon, I used that as a reference for matching the first part

and it works in testers(Patterns in OSX) enter image description here

but it splits the string into this:

[, , 1, A, , 2, B, , 3, C]

How can I achieve the following?:

[1A,2B,3C]

This is the test code:

String str = "SEQUENCE: 1A→2B→3C"; //Note that there's an extra space after the colon
System.out.println(Arrays.toString(str.split("(.*\\s)|([\\x{2192}]*)")));

Upvotes: 0

Views: 942

Answers (2)

C. K. Young
C. K. Young

Reputation: 223083

As noted in Richard Sitze's post, the main problem with the regex is that it should use + rather than *. Additionally, there are further improvements you can make to your regex:

  • Instead of \\x{2192}, use \u2192. And because it's a single character, you don't need to put it into a character class ([...]), you can just use \u2192+ directly.
  • Also, because | binds more loosely than .*\\s and \u2192+, you won't need the parentheses there either. So your final expression is simply ".*\\s|\u2192+".

Upvotes: 5

Richard Sitze
Richard Sitze

Reputation: 8463

The \u2192* will match 0 or more arrows - which is why you're splitting on every character (splitting on empty string). Try changing * to +.

Upvotes: 5

Related Questions