DuncanMack
DuncanMack

Reputation: 340

DecoderFallbackException trouble getting correct character

Let's say I have a file with this input:

"Crème donut, $1.00"

If a user uploads the file incorrectly encoded as ANSI and I parse it using TextFieldParser() with UTF8 encoding set to throw an exception on invalid bytes, it will correctly through an exception. It will report:

"Unable to translate bytes [E8] at index 321 from specified code page to Unicode."

The property "UnknownBytes" contains the byte array with a single entry of [232]. 232 is the decimal equivalent of E8. What's odd is that "è" should really be Byte[2] { 195, 168} I believe.

I would like to report back to the user what character caused the discrepancy.

What is the best way to do this?

If I return Encoding.UTF8.GetString(ex.UnknownBytes), it returns the Unicode replacement character instead of "è". Presumably this is because "232" as a single byte is invalid.

What am I missing? It seems like I have all the information I need to be helpful to the user, but I'm unable to communicate it.

Upvotes: 1

Views: 2029

Answers (1)

DuncanMack
DuncanMack

Reputation: 340

I see the issue. In my example I was using "è" as a foreign character. This is \xE8 in ANSI but \xC3\xA8 in UTF8. If I tried to render \xE8 in UTF8, or any Unicode encoding I believe, it wouldn't know what I was asking for since \xE8 isn't a valid hex value for the code point U+00E8.

I ended up using the following code which will work for my circumstances given my regional settings on my servers:

catch (DecoderFallbackException ex) 
{
    var ansiEncoding = Encoding.Default;

    var ansiOutput = ansiEncoding.GetString(ex.BytesUnknown);

    throw new PageException("This file contains unexpected characters. The following character was found in the file: " + ansiOutput);
}

Upvotes: 3

Related Questions