Reputation: 49
Why does Java String.split() generate different results when working with string defined in code versus string read from a file when numbers are involved? Specifically I have a file called "test.txt" that contains chars and numbers separated by spaces:
G H 5 4
The split method does not split on spaces as expected. But if a string variable is created within code with same chars and numbers separated by spaces then the result of split() is four individual strings, one for char and number. The code below demonstrates this difference:
import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;
public class SplitNumber {
//Read first line of text file
public static void main(String[] args) {
try {
File file = new File("test.txt");
FileReader fr = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fr);
String firstLine;
if ((firstLine = bufferedReader.readLine()) != null) {
String[] firstLineNumbers = firstLine.split("\\s+");
System.out.println("First line array length: " + firstLineNumbers.length);
for (int i=0; i<firstLineNumbers.length; i++) {
System.out.println(firstLineNumbers[i]);
}
}
bufferedReader.close();
String numberString = "G H 5 4";
String[] numbers = numberString.split("\\s+");
System.out.println("Numbers array length: " + numbers.length);
for (int i=0; i<numbers.length; i++) {
System.out.println(numbers[i]);
}
} catch(Exception exception) {
System.out.println("IOException occured");
exception.printStackTrace();
}
}
}
The result is:
First line array length: 3
G
H
5 4
Numbers array length: 4
G
H
5
4
Why do the numbers from the file not get parsed the same as the same string defined within code?
Upvotes: 3
Views: 113
Reputation: 49
Based on feedback I changed the regex to split("[\\s\\h]+")
which resolved the issue; the numbers for the file were properly split which clearly indicated that I had a different whitespace-like character in the text file that I was using. I then replaced the contents of the file (using notepad) and reverted back to split("\\s+")
and found that it worked correctly this time. So at some point I must have introduced different white-space like characters in the file (maybe a copy/paste issue). In the end the take away is I should use split("[\\s\\h]+")
when reading from a file where I want to split on spaces as it will cover more scenarios that may not be immediately obvious.
Thanks to all for helping me find the root cause of my issue.
Upvotes: 1