Reputation: 978
I have a csv file where each field (except column headings) has a double quote text qualifier: field: "some value"
. However some of the fields in the file have a double quote within the value; field2: "25" TV"
or field3: "25" x 14" x 2""
or field4: "A"bcd"ef"g"
. (I think you get the point). In cases where I have data like in fields 2-4, my java file process fails due to me specifying that the double-quote is a text-qualifier on the fields and it looks as if there are too many fields for that row. How do I do either or all of the following:
What is my level of control over this file? The file comes in as-is, but I just need data from two different columns in the file. I can do whatever I need to do to it to get that data.
Upvotes: 1
Views: 1145
Reputation: 6457
Assuming that a comma is the column separator and that every column is surrounded by double quotes:
String[] columns = input.split("\",\"");
if (columns.length > 0) {
columns[0] = columns[0].substring(1);
String lastColumn = columns[columns.length-1];
columns[columns.length-1] = lastColumn.substring(0,lastColumn.length()-1);
}
The columns will still have the internal double quotes. You can replace them out if you don't want them.
Upvotes: 0
Reputation: 1705
First, if it is indeed a CSV file, you should be using the presence of commas to break each line into columns.
Once its broken in columns, if we know for sure that the value should begin and end with double-quote ("), we can simply remove all of the double-quote and then re-apply the ones at the beginning and end.
String input = "\"hello\",\"goodbye Java \"the best\" language\", \"this is really \"\"\"bad\"";
String[] parsed = input.split(",");
String[] clean = new String[parsed.length];
int index = 0;
for (String value : parsed) {
clean[index] = "\"" + value.replace("\"", "") + "\"";
index++;
}
If a comma could exist inside of the value, the following should be used instead
String input = "\"hello\",\"goodbye,\" Java \"the best\" language\", \"this is really \"\"\"bad\"";
String[] parsed = input.split("\"\\s*,\\s*\"");
String[] clean = new String[parsed.length];
int index = 0;
for (String value : parsed) {
clean[index] = "\"" + value.replace("\"", "") + "\"";
index++;
}
}
Note that if the sequence of \"\s*,\s*\" existed inside a value, the record would be ambiguous. For example, if it was a two column file, the input record "abc","def","ghi" could be either
value 1 = "abc","def" value 2 = "ghi" or value 1 = "abc" value 2 = "def","ghi"
Upvotes: 1
Reputation: 903
Note many CSV implementations will escape a double quote as two consecutive quotes.
So "25"" TV"
might (should?) be your input.
Upvotes: 0