reeeeeeeeeeee
reeeeeeeeeeee

Reputation: 139

How to read in CSV columns in any order using CSVParser from apache.commons

I have a csv file with some data in this format:

id,first,last,city
1,john,doe,austin
2,jane,mary,seattle

As of now I'm reading in the csv using this code:

    String path = "./data/data.csv";
    Map<Integer, User> map = new HashMap<>();

    Reader reader = Files.newBufferedReader(Paths.get(path));

    try (CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT)) {

        List<CSVRecord> csvRecords = csvParser.getRecords();

        for(int i=0; i < csvRecords.size(); i++){

            if(0<i){//skip over header
                CSVRecord csvRecord = csvRecords.get(i);
                User currentUser = new User(
                        Double.valueOf(csvRecord.get(0)).intValue(),
                        Double.valueOf(csvRecord.get(1)),
                        Double.valueOf(csvRecord.get(2)),
                        Double.valueOf(csvRecord.get(3))
                );
                map.put(currentUser.getId(), currentUser);
            }
        }
    } catch (IOException e){
        System.out.println(e);
    }

which grab the correct values, but if the values were in a different order, say [city,last,id,first], it would be read incorrectly since the reading is hard coded with the order [id,first,last,city]. (the User object also must be created with the fields in the exact order of id,first,last,city)

I know that I can use the 'withHeader' option, but that also requires me to define the header column order in advance like so:

String header = "id,first,last,city";
CSVParser csvParser = new CSVParser(reader, CSVFormat.EXCEL.withHeader(header.split(",")));

I also know there is a built in function getHeaderNames() but that only gets the headers after I've already passed them in as a string (so hard coding again). So if I passed in the header string "last,first,id,city" it would return exactly that in a list.

Is there a way to combine these bits to read in the csv no matter what the column orders are and to define my 'User' object with fields passed in order (id,first,last,city)?

Upvotes: 3

Views: 12267

Answers (1)

Andreas
Andreas

Reputation: 159165

We need to tell the parser to process the header line for us. We specify that as part of the CSVFormat, so we'll create a custom format like this:

CSVFormat csvFormat = CSVFormat.RFC4180.withFirstRecordAsHeader();

Question code used DEFAULT, but this is based on RFC4180 instead. Comparing them side-by-side:

DEFAULT                               RFC4180                       Comment
===================================   ===========================   ========================
withDelimiter(',')                    withDelimiter(',')            Same
withQuote('"')                        withQuote('"')                Same
withRecordSeparator("\r\n")           withRecordSeparator("\r\n")   Same
withIgnoreEmptyLines(true)            withIgnoreEmptyLines(false)   Don't ignore blank lines
withAllowDuplicateHeaderNames(true)   -                             Don't allow duplicates
===================================   ===========================   ========================
                                      withFirstRecordAsHeader()     We need this

With that change, we can call get(String name) instead of get(int i):

User currentUser = new User(
        Integer.parseInt(csvRecord.get("id")),
        csvRecord.get("first"),
        csvRecord.get("last"),
        csvRecord.get("city")
);

Note that CSVParser implements Iterable<CSVRecord>, so we can use a for-each loop, which makes the code look like this:

String path = "./data/data.csv";

Map<Integer, User> map = new HashMap<>();
try (CSVParser csvParser = new CSVParser(Files.newBufferedReader(Paths.get(path)),
                                         CSVFormat.RFC4180.withFirstRecordAsHeader())) {
    for (CSVRecord csvRecord : csvParser) {
        User currentUser = new User(
                Integer.parseInt(csvRecord.get("id")),
                csvRecord.get("first"),
                csvRecord.get("last"),
                csvRecord.get("city")
        );
        map.put(currentUser.getId(), currentUser);
    }
}

That code correctly parses the file, even if the column order changes, e.g. to:

last,first,id,city
doe,john,1,austin
mary,jane,2,seattle

Upvotes: 3

Related Questions