Reputation: 5931
I wanted to get list of numbers from sequence of characters(that is: letters and digits). So I've written this code:
class A {
public static void main(String[] args) {
String msg = "aa811b22";
String[] numbers = msg.split("\\D+");
for (int i = 0; i < numbers.length; i++) {
System.out.println(">" + numbers[i] + "<");
}
}
}
Surpassingly it runs...:
$ java A
><
>811<
>22<
Ok, so somehow it matched empty string...I explained to myself that ""
(empty string) actually matches regexp of NON DIGIT MATCHER
so \D+
. Nothing is NOT digit...right? (however... why it returned only 1 empty string? There is infinite (∞) number of empty string inside any string)
To ensure myself I tried to extract words from string given above:
class A {
public static void main(String[] args) {
String msg = "aa811b22";
String[] words = msg.split("\\d+");
for (int i = 0; i < words.length; i++) {
System.out.println(">" + words[i] + "<");
}
}
}
which actually prints what I expected (no empty strings returned):
$ run A
>aa<
>b<
but... I did few more tests that completely confused me:
System.out.println("a".split("\\D+").length);
#=> 0 (WHY NOT 1? Empty string shouldn't be here?!)
System.out.println("a1".split("\\D+").length);
#=> 2 (So now it splits to empty string and 1)
System.out.println("1a".split("\\D+").length);
#=> 1 (now it returns expected "a" string)
So my questions are:
"a".split("\\D+").length
returns 0 ? "a1".split("\\D+").length
is 2 (but no one) "1a".split("\\D+").length)
varies from "a1".split("\\D+").length)
in case of splitting?Upvotes: 1
Views: 1313
Reputation: 178263
'a'
is not a digit, so aa
is a separator. There are elements to return on either side of a separator, and the empty string is to the left of a
. If the separator were ","
, then out of the string ",a,b"
you would expect 3 elements -- ""
, "a"
, and "b"
. Here, aa
is the separator, just like ,
in my example.
"a".split("\\D+").length
returns 0 ?'a'
is not a digit, so it's a separator. The presence of the separator means that there are two substring split out of the original String
, both empty strings, on either side of the a
. However, the no-arg split
method discards trailing empty strings. They're all empty, so they're all discarded, and the length
is 0
.
"a1".split("\\D+").length
is 2 (but not one)Only trailing empty strings are discarded, so the elements are ""
and "1"
.
"1a".split("\\D+").length
varies from "a1".split("\\D+").length
in case of splitting?"1a"
will have one trailing empty string discarded, but "a1"
will not have a trailing empty string discarded (it's leading).
Upvotes: 2
Reputation: 54551
It's not matching an empty string. Rather, it's matching the "aa"
at the beginning of your string as a delimiter. The first element is empty because there is only an empty string before the first delimiter. In contrast, for trailing delimiters there is no empty string returned, as mentioned in the documentation for split():
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
Upvotes: 1