user939857
user939857

Reputation: 395

Jackson CSV parser chokes on comma separated value files if "," is in a field even if quoting with "

The code:

package org.javautil.salesdata;
import java.io.File;
import java.io.IOException;
import java.util.Map;

import org.javautil.util.ListOfNameValue;

import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;

// https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv
public class Manufacturers {
    private static final String fileName= "src/main/resources/pdssr/manufacturers.csv";

    ListOfNameValue getManufacturers() throws IOException {
        ListOfNameValue lnv = new ListOfNameValue();
        File csvFile = new File(fileName);
        CsvMapper mapper = new CsvMapper();

        CsvSchema schema = CsvSchema.emptySchema().withHeader(); // use first row as header; otherwise defaults are fine
        MappingIterator<Map<String,String>> it = mapper.readerFor(Map.class)
           .with(schema)
           .readValues(csvFile);
        while (it.hasNext()) {
          Map<String,String> rowAsMap = it.next();
          System.out.println(rowAsMap);
        }

        return lnv;

    }

}

The data:

"mfr_id","mfr_cd","mfr_name"
"0000000020","F-L", "Frito-Lay"
"0000000030","GM", "General Mills"
"0000000040","HVEND", "Hershey Vending"
"0000000050","HFUND", "Hershey Fund Raising"
"0000000055","HCONC", "Hershey Concession"
"0000000060","SNYDERS", "Snyder's of Hanover"
"0000000080","KELLOGG", "Kellogg & Keebler"
"0000000115","KARS", "Kar Nut Product (Kar's)"
"0000000135","MARS", "Mars Chocolate "
"0000000145","POORE", "Inventure Group (Poore Brothers)"
"0000000150","WOW", "WOW Foods"
"0000000160","CADBURY", "Cadbury Adam USA, LLC"
"0000000170","MONOGRAM", "Monogram Food"
"0000000185","JUSTBORN", "Just Born"
"0000000190","HOSTESS", "Hostess, Dolly Madison"
"0000000210","SARALEE", "Sara Lee"

The exception is

fasterxml.jackson.databind.exc.RuntimeJsonMappingException: Too many entries: expected at most 3 (value #3 (4 chars) "LLC"")

I thought I would throw out my own CSV parser and adopt a supported project with more functionality, but most of them are far slower, just plain break or have examples all over the web that don't work with current release of the product.

Upvotes: 4

Views: 5995

Answers (2)

Bruce Martin
Bruce Martin

Reputation: 10543

The problem is your file does not meet the CSV standard. The third field always starts with a space

mfr_id","mfr_cd","mfr_name"
"0000000020","F-L", "Frito-Lay"
"0000000030","GM", "General Mills"
"0000000040","HVEND", "Hershey Vending"
"0000000050","HFUND", "Hershey Fund Raising"

From wikipedia:

According to RFC 4180, spaces outside quotes in a field are not allowed; however, the RFC also says that "Spaces are considered part of a field and should not be ignored." and "Implementors should 'be conservative in what you do, be liberal in what you accept from others' (RFC 793, section 2.10) when processing CSV files."

Jackson is being "liberal" in processing most of your records; but when it finds

"0000000160","CADBURY", "Cadbury Adam USA, LLC"

It has no choice but to treat is as 4 fields:

  • '0000000160'
  • 'CADBURY'
  • ' "Cadbury Adam USA'
  • ' LLC"'

Would suggest fixing the file as that will allow parsing with most CSV libraries. You could try another library, there is no shortage of them.

Upvotes: 2

Jeronimo Backes
Jeronimo Backes

Reputation: 6289

univocity-parsers can handle that without any issues. It's built to deal with all sorts of tricky and non-standard CSV files and is also faster than the parser you are using.

Try this code:

    String fileName= "src/main/resources/pdssr/manufacturers.csv";
    CsvParserSettings settings = new CsvParserSettings();
    settings.setHeaderExtractionEnabled(true);

    CsvParser parser = new CsvParser(settings);
    for(Record record : parser.iterateRecords(new File(fileName))){
        Map<String, String> rowAsMap = record.toFieldMap();
        System.out.println(rowAsMap);
    }

Hope this helps.

Disclosure: I'm the author of this library. It's open source and free (Apache 2.0 license)

Upvotes: 2

Related Questions