Reputation: 330
I'm trying to parse a folder of csv files (balance sheets), and have everythings gone smoothly up until I tried to separate the row names from the values.
It looks like the last cell on the previous row is combining with the first cell (the row name in column A) in the next row.
File path = new File("/Users/Zack/Desktop/JavaDB/BALANCESHEETS");
for(File file: path.listFiles()) {
if (file.isFile()) {
String fileName = file.getName();
String ticker = fileName.split("\\_")[0];
if (ticker.equals("ASB") || ticker.equals("FRC")) {
if (ticker.equals("ASB")) {
ticker = ticker + "PRD";
}
if (ticker.equals("FRC")) {
ticker = ticker + "PRD";
}
}
Reader reader = new BufferedReader(new FileReader(file));
StringBuilder builder = new StringBuilder();
int c;
while ((c = reader.read()) != -1) {
builder.append((char) c);
}
String string = builder.toString();
ArrayList<String> stringResult = new ArrayList<String>();
if (string != null) {
String[] splitData = string.split("\\s*,\\s*");
for (int i = 0; i <splitData.length; i++) {
if (!(splitData[i] == null) || !(splitData[i].length() ==0)) {
stringResult.add(splitData[i].trim());
}
}
}
for (int i = 0; i < stringResult.size(); i++) {
int cL = stringResult.get(i).length();
for (int x = 0; x < cL; x++) {
if (Character.isLetter(stringResult.get(i).charAt(x))) {
System.out.println("index: " + i);
System.out.println(stringResult.get(i));
break;
}
}
}
Here are some photos of what's happening https://postimg.org/image/a9qc1qggz/ https://postimg.org/image/mvna7p7s3/
Any idea on how to fix this?
I also noticed there is a space in front of the row names in the spreadsheets, which I suspect may be part of the problem.
Upvotes: 1
Views: 188
Reputation: 504
As Erwin stated in the comments, your Pattern that you are splitting on just looks for commas with whitespace around them. It looks like you know what format your data will be in since you know that the data will be separated by either whitespace comma whitespace or a newline. Seems to me you just need to change your input to "\\s*,\\s*|$"
, which is the regex that says that. Like has been mentioned you need to know beforehand that the data doesn't include whitespace comma whitespace in any of the fields or this breaks.
Upvotes: 0
Reputation: 10969
The problem is coming from where you are reading in the file, here:
Reader reader = new BufferedReader(new FileReader(file));
StringBuilder builder = new StringBuilder();
int c;
while ((c = reader.read()) != -1) {
builder.append((char) c);
}
String string = builder.toString();
This reads all the characters into a single string, including the new line character(s). When you then split the string, you are not splitting on the new line character(s) and so you end up with what you are seeing.
As mentioned but others I strongly urge you to use one of the many csv parsers that already exist.
The simple (but ugly) fix would be to also split on newlines. A better fix would be to use the readLine()
method of the BufferedReader
.
Also !=
is your friend.
Upvotes: 1