Daniel Rodríguez
Daniel Rodríguez

Reputation: 582

OpenCSV CsvToBean: First column not read for UTF-8 Without BOM

Using OpenCSV to parse UTF-8 documents without BOM results in the first column not read. Giving as an input the same document content but encoded in UTF-8 with BOM works correctly.

I set specifically the charset to UTF-8

    fileInputStream = new FileInputStream(file);
    inputStreamReader = new InputStreamReader(fileInputStream, StandardCharsets.UTF_8);
    reader = new BufferedReader(inputStreamReader);
    HeaderColumnNameMappingStrategy<Bean> ms = new HeaderColumnNameMappingStrategy<Bean>();
    ms.setType(Bean.class);
    CsvToBean<Bean> csvToBean = new CsvToBeanBuilder<Bean>(reader).withType(Bean.class).withMappingStrategy(ms)
            .withSeparator(';').build();
    csvToBean.parse();

I've created a sample project where the issue can be reproduced: https://github.com/dajoropo/csv2beanSample

Running the Unit Test you can see how the UTF-8 file without BOM fails and with BOM works correctly.

The error comes in the second assertion, because the first column in not read. Result it:

[Bean [a=null, b=second, c=third]]

Any hint?

Upvotes: 4

Views: 9382

Answers (1)

Alexander Pavlov
Alexander Pavlov

Reputation: 2210

If I open Bean class in you project and search for "B" then I can find one entry. If I search for "A" then I cannot :) It means you copy/pasted A with BOM header to Bean class. BOM header is not visible but still taken into account.

If I fix "A" then another test starts failing but I think you can fix it using BOMInputStream.

Check this question and answer Byte order mark screws up file reading in Java

It is known problem. You can use Apache Commons IO's BOMInputStream to solve it.

Just tried

    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.6</version>
    </dependency>

and

        inputStreamReader = new InputStreamReader(new BOMInputStream(fileInputStream), StandardCharsets.UTF_8);

and fixing

@CsvBindByName(column = "A")
private String a;

to exclude prefix from "A" makes both tests passing

Upvotes: 8

Related Questions