Reputation: 99
I get html file which I need to read and parse, this file can be in plain English, japenese, or any language with associated character encoding required for that language. The problem occurs when file is in Japenese with any of these encodings
I tried reading file with FileReader but resulting file is all garbage characters. I also tried using FileInputStream with just hard coding japenese encoding to check if Japanese file is read correctly but result is not as expected.
FileInputStream fis = new FileInputStream(htmlFile);
InputStreamReader isr = new InputStreamReader(fis, " ISO-2022-JP");
I don’t have much experience with character encoding and internationalization, any suggestions on how I can read/write files with different encodings?
one more thing, I don't know how to get the character encoding of the html file I am reading, I understand that I need to write file in same encoding but not sure how to get original file's encoding Thanks,
Upvotes: 2
Views: 1471
Reputation: 346476
FileReader
exists, it implicitly uses the platform default encoding, which makes it pretty much useless.Content-Type
HTTP header - but this is only available when you read the file from the webserver, not when it's saved as a file<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
<?xml version="1.0" encoding="UTF-8"?>
Upvotes: 4