Paul
Paul

Reputation: 1383

reading text files of different encodings in java

If I had a file encoded in ISO but wanted to read the file as UTF-8 using java would I still get the same text?

would special characters such as µÃÿ display the same?

Upvotes: 0

Views: 486

Answers (2)

alvonellos
alvonellos

Reputation: 1062

In short, no. The way the characters are represented (bitwise) in ISO is not the same as how characters are represented in UTF-8.

However, you can convert a file from ISO to UTF-8, but not UTF-8 to ISO, because there are many more recognizable characters in UTF-8 than there are in ISO.

My recommendation would be to detect the encoding (see: Java : How to determine the correct charset encoding of a stream) and then to handle each case accordingly.

Upvotes: 0

nneonneo
nneonneo

Reputation: 179392

No, you would not. UTF-8 does not encode characters beyond U+007f in the same way as ISO-8859-1 (ISO-8859-1 encodes U+0080 through U+00ff as single bytes \x80 to \xff, while UTF-8 uses two bytes for each of those characters).

You have to use an explicit encoding specification when opening the file: new InputStreamReader(new FileInputStream(...), <encoding>)

Upvotes: 1

Related Questions