Reputation: 5974
I am receiving data in this forma, each of these three lines is it's own string:
0 -rw------- 1 167 Tue Nov 13 10:39:28 2012 .bash_history
0 -rw-r--r-- 1 40 Wed Nov 28 12:18:03 2012 aaa.txt
22290 -rw-r--r-- 1 22824944 Tue Jan 15 15:05:58 2013 a.bin
I tried using this regex to split it into tokens delimited by white space.
String[] tokens = newParts[i].split("\\s{1,}");
However this is always creating the first token as an empty string for the first two lines, and correctly sets 22290 as the first token for the third line. Why is this? All the rest of the tokens are as I want. Just not the first one of the first two lines, why?
Upvotes: 1
Views: 135
Reputation: 7194
To quote the Pattern.split
documentation:
The array returned by this method contains each substring of the input sequence that is terminated by another subsequence that matches this pattern or is terminated by the end of the input sequence.
So if your string starts with the separator, your first element will be an empty string. The same way, if your string ends with the separator, your last element will be an empty string.
Edit: Actually split(string)
calls split(string, 0)
, which explicitly discards trailing empty elements. But it doesn't do anything about empty starting elements.
It should work as you expect if you call trim()
on the input first.
Upvotes: 1
Reputation: 9559
Before splitting the string, you could .trim() it to remove leading and trailing whitespace. That should prevent unwanted extra tokens.
Upvotes: 1