Topper Harley
Topper Harley

Reputation: 12384

How is this file encoded?

I have a .csv file generated by Excel that I got from my customer. My software has to open and parse it in java. I'm using universalchardet but it did not detect the encoding from the first 1,000 bytes of the file.

Within these 1,000 first bytes, there is a sequence that should be read as Boîte, however I cannot find the correct encoding to use to convert this file to UTF-8 strings in java.

In the file, Boîte is encoded as 42,6F,94,74,65 (read using a hex editor). B, o, t and e are using the standard latin encoding with 1 byte per character. The î is also encoded on only one byte, 0x94.

I don't know how to guess this charset, none of my searches online yielded relevant results.

I also tried to use file on that file:

$ file export.csv
/Users/bicou/Desktop/export.csv: Non-ISO extended-ASCII text, with CR line terminators

However I looked at the extended-ASCII charset, the value 0x94 stands for ö.

Have you got other ideas for guessing the encoding of that file?

Upvotes: 1

Views: 449

Answers (1)

Topper Harley
Topper Harley

Reputation: 12384

This was Mac OS Roman encoding. When using the following java code, the text was properly decoded:

InputStreamReader isr = new InputStreamReader(new FileInputStream(targetFileName), "MacRoman");

I don't know how to delete my own question. I don't think it is useful anymore...

Upvotes: 3

Related Questions