Reputation: 59
I am trying to parse a CSV file using JakartaCommons-csv
Sample input file
Field1,Field2,Field3,Field4,Field5
"Ryan, R"u"bianes"," [email protected]","29445","626","South delhi, Rohini 122001"
Formatter: CSVFormat.newFormat(',').withIgnoreEmptyLines().withQuote('"') CSV_DELIMITER is ,
Output
Exception: Caused by: java.io.IOException: (line 2) invalid char between encapsulated token and delimiter
Upvotes: 3
Views: 5315
Reputation: 6289
The problem here is that the quotes are not properly escaped. Your parser doesn't handle that. Try univocity-parsers as this is the only parser for java I know that can handle unescaped quotes inside a quoted value. It is also 4 times faster than Commons CSV. Try this code:
//configure the parser to handle your situation
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true); //uses first line as headers
settings.setUnescapedQuoteHandling(STOP_AT_CLOSING_QUOTE);
settings.trimQuotedValues(true); //trim whitespace around values in quotes
//create the parser
CsvParser parser = new CsvParser(settings);
String input = "" +
"Field1,Field2,Field3,Field4,Field5\n" +
"\"Ryan, R\"u\"bianes\",\" [email protected]\",\"29445\",\"626\",\"South delhi, Rohini 122001\"";
//parse your input
List<String[]> rows = parser.parseAll(new StringReader(input));
//print the parsed values
for(String[] row : rows){
for(String value : row){
System.out.println('[' + value + ']');
}
System.out.println("-----");
}
This will print:
[Ryan, R"u"bianes]
[[email protected]]
[29445]
[626]
[South delhi, Rohini 122001]
-----
Hope it helps.
Disclosure: I'm the author of this library, it's open source and free (Apache 2.0 license)
Upvotes: 0
Reputation: 718916
The problem is that your file is not following the accepted standard for quoting in CSV files. The correct way to represent a quote in a quoted string is by repeating the quote. For example.
Field1,Field2,Field3,Field4,Field5
"Ryan, R""u""bianes"," [email protected]","29445","626","South delhi, Rohini 122001"
If you restrict yourself to the standard form of CSV quoting, the Apache Commons CSV parser should work.
Unfortunately, it is not feasible to write a consistent parser for your variant format because there is no way disambiguate an embedded comma and a field separator if you need to represent a field containing "Ryan R","baines
".
The rules for quoting in CSV files are set out in various places including RFC 4180.
Upvotes: 3