Reputation: 973
So, I have file in ISO8859-1
encoding. I do the next:
InputStreamReader isr = new InputStreamReader(new FileInputStream(fileLocation));
System.out.println(isr.getEncoding());
And I get UTF8
... Looks like FileInputStream
or InputStreamReader
convert it to UTF8
.
Yes, I know about the next one way:
BufferedReader br = new BufferedReader(
new InputStreamReader(
new FileInputStream(fileLocation), "ISO-8859-1");
But I don't know beforehand what encoding my file will have.
How can I read file with saving encoding?
Upvotes: 1
Views: 114
Reputation: 109547
Binary files (bytes) that are actually text in some encoding for those bytes, unfortunately do not store the encoding (charset) somewhere.
Sometimes there is an encoding somewhere: Unicode text could have an optional BOM character at the begin of the file. HTML and XML can specify the charset.
If you downloaded the file from the internet in the header lines the charset could be mentioned. Say it were an HTML file, and Content-Type: text/html; charset=Windows-1251
. Then you could read the file with Windows-1251, and always store it as UTF-8, modifying/adding a <meta charset="UTF-8">
.
But in general there is no solution for determining some file's encoding. You could do:
There might be a library doing such a thing; combining language recognition and charset recognition.
Upvotes: 2