Reputation: 12384
I have a .csv file generated by Excel that I got from my customer. My software has to open and parse it in java. I'm using universalchardet but it did not detect the encoding from the first 1,000 bytes of the file.
Within these 1,000 first bytes, there is a sequence that should be read as Boîte
, however I cannot find the correct encoding to use to convert this file to UTF-8 strings in java.
In the file, Boîte
is encoded as 42,6F,94,74,65
(read using a hex editor). B
, o
, t
and e
are using the standard latin encoding with 1 byte per character. The î
is also encoded on only one byte, 0x94.
I don't know how to guess this charset, none of my searches online yielded relevant results.
I also tried to use file
on that file:
$ file export.csv
/Users/bicou/Desktop/export.csv: Non-ISO extended-ASCII text, with CR line terminators
However I looked at the extended-ASCII charset, the value 0x94
stands for ö
.
Have you got other ideas for guessing the encoding of that file?
Upvotes: 1
Views: 449
Reputation: 12384
This was Mac OS Roman encoding. When using the following java code, the text was properly decoded:
InputStreamReader isr = new InputStreamReader(new FileInputStream(targetFileName), "MacRoman");
I don't know how to delete my own question. I don't think it is useful anymore...
Upvotes: 3