Regex / String.split not working as expected

Question

I am receiving data in this forma, each of these three lines is it's own string:

   0 -rw-------    1       167 Tue Nov 13 10:39:28 2012 .bash_history
   0 -rw-r--r--    1        40 Wed Nov 28 12:18:03 2012 aaa.txt
22290 -rw-r--r--    1  22824944 Tue Jan 15 15:05:58 2013 a.bin

I tried using this regex to split it into tokens delimited by white space.

String[] tokens = newParts[i].split("\s{1,}");

However this is always creating the first token as an empty string for the first two lines, and correctly sets 22290 as the first token for the third line. Why is this? All the rest of the tokens are as I want. Just not the first one of the first two lines, why?

Dan Berindei · Accepted Answer

To quote the Pattern.split documentation:

The array returned by this method contains each substring of the input sequence that is terminated by another subsequence that matches this pattern or is terminated by the end of the input sequence.

So if your string starts with the separator, your first element will be an empty string. The same way, if your string ends with the separator, your last element will be an empty string.

Edit: Actually split(string) calls split(string, 0), which explicitly discards trailing empty elements. But it doesn't do anything about empty starting elements.

It should work as you expect if you call trim() on the input first.

Regex / String.split not working as expected

Answers (2)

Related Questions