How to read in files into a UTF-8 Java app where the files are in different character encodings?

Question

My application is set up to support storing UTF-8 character encodings. I am reading files that I get from various other organizations which might be in UTF-8, latin-1, ASCII, etc. Do I need to do anything special to ensure that the files which have various character encodings are read into UTF-8 format correctly? e.g. do I need to figure out what character encoding the file is in and explicitly convert it to UTF-8?

Or is the following sufficient?

Reader reader = new InputStreamReader(new FileInputStream("c:/file.txt"), "UTF-8");

jtahlborn · Accepted Answer

You have that wrong. You don't read into an encoding, you read from encoding. The encoding you provide as the second argument to InputStreamReader should be the expected encoding of the source stream(file).

Reader reader = new InputStreamReader(new FileInputStream("c:/file.txt"), "");

Once the data is in memory, it is always UTF-16. When you want to write the data (assuming you always want to write it as UTF-8), then you will use:

Writer writer = new OutputStreamWriter(new FileOutputStream("destfile"), "UTF-8");

How to read in files into a UTF-8 Java app where the files are in different character encodings?

Answers (2)

Related Questions