DecoderFallbackException trouble getting correct character

Question

Let's say I have a file with this input:

"Crème donut, $1.00"

If a user uploads the file incorrectly encoded as ANSI and I parse it using TextFieldParser() with UTF8 encoding set to throw an exception on invalid bytes, it will correctly through an exception. It will report:

"Unable to translate bytes [E8] at index 321 from specified code page to Unicode."

The property "UnknownBytes" contains the byte array with a single entry of [232]. 232 is the decimal equivalent of E8. What's odd is that "è" should really be Byte[2] { 195, 168} I believe.

I would like to report back to the user what character caused the discrepancy.

What is the best way to do this?

If I return Encoding.UTF8.GetString(ex.UnknownBytes), it returns the Unicode replacement character instead of "è". Presumably this is because "232" as a single byte is invalid.

What am I missing? It seems like I have all the information I need to be helpful to the user, but I'm unable to communicate it.

DecoderFallbackException trouble getting correct character

Answers (1)

Related Questions