Reputation: 912
I am puzzled about the split methode with regex in Java. It is a rather theoretical question that poped up and i can't figure it out.
I found this answer: Java split by \\S but the advice to use \\s instead of \\S does not explain what is happening here.
Why: does quote.split("\\S") has 2 results in case A and 8 in case B ?
case A)
String quote = " x xxxxxx";
String[] words = quote.split("\\S");
System.out.print("\\S >>\t");
for (String word : words) {
System.out.print(":" + word);
}
System.out.println(words.length);
Result:
\\S >> : : 2
case B)
String quote = " x xxxxxx ";
String[] words = quote.split("\\S");
System.out.print("\\S >>\t");
for (String word : words) {
System.out.print(":" + word);
}
System.out.println(words.length);
Result:
\\S >> : : :::::: 8
It would be wonderfull to understand what happens here. Thanks in advance.
Upvotes: 1
Views: 1608
Reputation: 981
As Jongware noticed, the documentation for String.split(String) says:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
So it works somewhat like this:
"a:b:::::".split(":") === removeTrailing([a,b,,,,,]) === [a,b]
"a:b:::::c".split(":") === removeTrailing([a,b,,,,,c]) === [a,b,,,,,c]
And in your example:
" x xxxxxx".split("\\S") === removeTrailing([ , ,,,,,,]) === [ , ]
" x xxxxxx ".split("\\S") === removeTrailing([ , ,,,,,, ]) === [ , ,,,,,, ]
To collapse multiple delimiters into one, use \S+
pattern.
" x xxxxxx".split("\\S+") === removeTrailing([ , ,]) === [ , ]
" x xxxxxx ".split("\\S+") === removeTrailing([ , , ]) === [ , , ]
As suggested in the comments, to maintain the trailing empty strings we can use overloaded version of split method (String.split(String, int)) with a negative number passed as limit.
"a:b:::::".split(":", -1) === [a,b,,,,,]
Upvotes: 3