Reputation: 12876
My application is set up to support storing UTF-8 character encodings. I am reading files that I get from various other organizations which might be in UTF-8, latin-1, ASCII, etc. Do I need to do anything special to ensure that the files which have various character encodings are read into UTF-8 format correctly? e.g. do I need to figure out what character encoding the file is in and explicitly convert it to UTF-8?
Or is the following sufficient?
Reader reader = new InputStreamReader(new FileInputStream("c:/file.txt"), "UTF-8");
Upvotes: 0
Views: 967
Reputation: 12215
You need to tell the reader the encoding of the file.
If your input can be in many different encodings, then you might have a problem: You cannot reliably detect an encoding, see How can I detect the encoding/codepage of a text file
When you want to support different encodings, you basically have three options:
<?xml version="1.0" encoding="UTF-8" ?>
in XML files. Unfortunately, not all file formats – such as "plain text" files – have such meta data.Upvotes: 2
Reputation: 53694
You have that wrong. You don't read into an encoding, you read from encoding. The encoding you provide as the second argument to InputStreamReader
should be the expected encoding of the source stream(file).
Reader reader = new InputStreamReader(new FileInputStream("c:/file.txt"), "<encoding_of_file.txt>");
Once the data is in memory, it is always UTF-16. When you want to write the data (assuming you always want to write it as UTF-8), then you will use:
Writer writer = new OutputStreamWriter(new FileOutputStream("destfile"), "UTF-8");
Upvotes: 6