Reputation: 28074
I have some bytes which should be UTF-8 encoded, but which may contain a text is ISO8859-1 encoding, if the user somehow didn't manage to use his text editor the right way.
I read the file with an InputStreamReader:
InputStreamReader reader = new InputStreamReader(
new FileInputStream(file), Charset.forName("UTF-8"));
But every time the user uses umlauts like "ä", which are invalid UTF-8 when stored in ISO8859-1 the InputStreamReader does not complain but adds placeholder characters.
Is there is simple way to make this throw an Exception on invalid input?
Upvotes: 7
Views: 1522
Reputation: 140210
Simply add .newDecoder()
:
InputStreamReader reader = new InputStreamReader(
new FileInputStream(file), Charset.forName("UTF-8").newDecoder());
Upvotes: 1
Reputation: 13890
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
InputStreamReader reader = new InputStreamReader(
new FileInputStream(file), decoder);
Upvotes: 7