rrevi
rrevi

Reputation: 1051

Java character conversion to UTF-8

I am using:

InputStreamReader isr = new InputStreamReader(fis, "UTF8");

to read in characters from a text file and converting them to UTF8 characters.

My question is, what if one of the characters being read cannot be converted to utf8, what happens? Will there be an exception? or will get the character get dropped off?

Upvotes: 2

Views: 1479

Answers (2)

Aravind Yarram
Aravind Yarram

Reputation: 80176

You are not converting from one charset to another. You are just indicating that the file is UTF 8 encoded so that you can read it correctly.

If you want to convert from 1 encoding to the other then you should do something like below

File infile = new File("x-utf8.txt");
File outfile = new File("x-utf16.txt");

String fromEncoding="UTF-8";
String toEncoding="UTF-16";

Reader in = new InputStreamReader(new FileInputStream(infile), fromEncoding);
Writer out = new OutputStreamWriter(new FileOutputStream(outfile), toEncoding);

After going through the David Gelhar's response, I feel this code can be improved a bit. If you doesn't know the encoding of the "inFile" then use the GuessEncoding library to detect the encoding and then construct the reader in the encoding detected.

Upvotes: 7

David Gelhar
David Gelhar

Reputation: 27900

If the input file contains bytes that are not valid utf-8, read() will by default replace the invalid characters with a value of U+FFFD (65533 decimal; the Unicode "replacement character").

If you need more control over this behavior, you can use:

InputStreamReader(InputStream in, CharsetDecoder dec)

and supply a CharsetDecoder configured to your liking.

Upvotes: 3

Related Questions