Reputation: 582
Using OpenCSV to parse UTF-8 documents without BOM results in the first column not read. Giving as an input the same document content but encoded in UTF-8 with BOM works correctly.
I set specifically the charset to UTF-8
fileInputStream = new FileInputStream(file);
inputStreamReader = new InputStreamReader(fileInputStream, StandardCharsets.UTF_8);
reader = new BufferedReader(inputStreamReader);
HeaderColumnNameMappingStrategy<Bean> ms = new HeaderColumnNameMappingStrategy<Bean>();
ms.setType(Bean.class);
CsvToBean<Bean> csvToBean = new CsvToBeanBuilder<Bean>(reader).withType(Bean.class).withMappingStrategy(ms)
.withSeparator(';').build();
csvToBean.parse();
I've created a sample project where the issue can be reproduced: https://github.com/dajoropo/csv2beanSample
Running the Unit Test you can see how the UTF-8 file without BOM fails and with BOM works correctly.
The error comes in the second assertion, because the first column in not read. Result it:
[Bean [a=null, b=second, c=third]]
Any hint?
Upvotes: 4
Views: 9382
Reputation: 2210
If I open Bean
class in you project and search for "B" then I can find one entry. If I search for "A" then I cannot :) It means you copy/pasted A with BOM header to Bean
class. BOM header is not visible but still taken into account.
If I fix "A" then another test starts failing but I think you can fix it using BOMInputStream
.
Check this question and answer Byte order mark screws up file reading in Java
It is known problem. You can use Apache Commons IO's BOMInputStream
to solve it.
Just tried
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
and
inputStreamReader = new InputStreamReader(new BOMInputStream(fileInputStream), StandardCharsets.UTF_8);
and fixing
@CsvBindByName(column = "A")
private String a;
to exclude prefix from "A" makes both tests passing
Upvotes: 8