Reputation: 556
I'm using open csv to read a CSV that has only 2 columns, example:
"valueA1","valueB of A1"
,"valueB of A1"
,"valueB of A1"
,"valueB of A1"
"valueA2","valueB of A2"
,"valueB of A2"
,"valueB of A2"
,"valueB of A2"
And it keeps going, so far so good. I'm having trouble when some of the values on column B are coming with some characters Example:
"valueA1","va"lueB" of A1"
,"valueB of A1"
,"valueB of A1"
,"valueB of A1"
"valueA2","valueB of A2"
,"valueB of A2"
,"valueB of A2"
,"valueB of A2"
When my column B has something like a " or even in some cases a : my open csv loses all the format and instead of saying my next line contains 2 values it shows me crazy things like 36, 48, and it loses a huge amount of content due to that.
How can I specify that these values are inside the colum B and h'es not suposed to assume that he needs to create a new column?
I'm using the simple constructor
reader = new CSVReader(new FileReader(arquivo));
By doing research I saw some of the people saying to use a different constructor, I tried to do
reader = new CSVReader(new FileReader(arquivo), ',','"');
But the result was the same, since the constructor accepts only char I can't do ","
Upvotes: 0
Views: 654
Reputation: 1912
There's no official standard for CSV but there is an unofficial one, and the sample you show doesn't comply with it. Values containing the double-quote character should be quoted, and the double-quote characters within the value should be doubled.
valueA1,"va""lueB"" of A1"
(It's permitted but not required to put double quotes around "valueA1" in this example.) Since you're creating the CSV file yourself, from an XLS file, you should fix the problem in the conversion.
Excel 2007 does this correctly if you save as "CSV (MS-DOS) (*.csv)". I recall that older versions of Excel had an CSV options screen that you might need to play with.
Upvotes: 0
Reputation: 109603
First I still would set the separator and quote explicitly. You said you had a problem with ;
.
CSVReader reader = new CSVReader(new FileReader(arquivo), ',', '\"');
Then there is a data error: a text value va"lueB" of A1
which Excel for instance would self-escape as: va""lueB"" of A1
. I do not know what CSVWriter would make of a double quote.
The least invasive would be to correct the data while reading:
CSVReader reader = new CSVReader(new RepairingReader(new FileReader(arquivo)),
',', '\"', '\\');
Here I also specify the escape character for separator and quote.
CSVReader
uses a BufferedReader
either passed, or added by itself and calls readLine
.
public class RepairingReader extends BufferedReader {
public RepairingReader(Reader reader, int capacity) {
super(reader, capacity);
}
public RepairingReader(Reader reader) {
super(reader);
}
@Override
public String readLine() throws IOException {
String line = super.readLine();
if (line != null) {
line = line.replaceAll("([^,\\\\])\"([^,])", "$1\\\\\"$2");
}
return line;
}
}
This just overrides readLine
. It repaces any quote that has a character in front (not backslash, not comma) and after (not comma). The replacement should be a backslash followed by a quote.
This is not a full parser, as then one would implement a CSV reader oneself.
By the way: FileReader
has no option to set the encoding used for converting the file to Unicode String. The default platform encoding is used. Hence the code is not portable, the file should be in the local encoding. Use new InputStreamReader(new FileInputStream(file), encoding)
instead for portable software.
Upvotes: 1