Sulphy
Sulphy

Reputation: 776

Recieving POST request with charset=ISO-8859-1 - How to convert to UTF-8?

I've got POST requests hitting my web-api app and the Content-Type header contains 'charset=ISO-8859-1'.

However certain symbols including the pound symbol £ comes out as a funny question mark in a diamond.

enter image description here

I've paused my code execution as far up as the controller (so after the automatic model binding). Inspecting the model the content by this point is showing unsupported symbols like my image example above.

Does anyone know if web-api would have automatically passed the content into the model keeping the charset ISO-8859-1 intact?

I've tried to convert one of the fields in my model where all the unsupported characters reside. I've used an example by Microsoft found here: https://msdn.microsoft.com/en-us/library/kdcak6ye(v=vs.110).aspx

I thought it had worked because those funny looking diamonds are replaced by normal question marks. The problem I have though is that the translation/mapping between the two character sets isn't 100% as the pound currency symbol is now shown as a normal question mark. I'm now starting to wonder if perhaps the automatic model-binding has already in someway done a charset conversion impeding my attempts.

If I had my way I'd be asking the client to change the charset that is being presented, but for now this isn't an option.

Thanks.

Upvotes: 1

Views: 6204

Answers (1)

Michael Domashchenko
Michael Domashchenko

Reputation: 1480

By the time the POSTed content reaches a method of an api controller all the strings are already converted into CLR's internal 2 byte per character Unicode representation using System.Text.Encoding that should match the charset specified in the Content-Type header.

If you see those diamonds with question marks in your string variables/fields, it's too late, because it means that the Encoding was not able to parse the byte stream properly and used those characters just as a fallback. Notice that you've got exact same diamond symbols for both '¬' and '£' characters.

Different Encoding implementations may use different placeholder symbols, more specifically, those diamond with question mark symbols are the default fallback characters for Utf8 encoding, unlike the iso-8859-1 encoding which by default is using ordinary question marks.

So, since you see diamond symbols, it looks like your requests are actually processed by Utf8 encoding, which is rather unusual, since you say the Content-Type specifies 8859-1.

In Web API the formatting/parsing of raw Http requests and responses is handled by descendants of System.Net.Http.Formatting.MediaTypeFormatter configured in HttpConfiguration.Formatters which by default is configured to have the following four instances:

[0]: {System.Net.Http.Formatting.JsonMediaTypeFormatter}
[1]: {System.Net.Http.Formatting.XmlMediaTypeFormatter}
[2]: {System.Net.Http.Formatting.FormUrlEncodedMediaTypeFormatter}
[3]: {System.Web.Http.ModelBinding.JQueryMvcFormUrlEncodedFormatter}

Each of those have SupportedEncodings property that determines which encodings the formatter is prepared to handle. By default the first two are configured to handle Utf8 and Unicode, which is Utf16, but they are configured to throw exceptions if they encounter an error in the input stream rather than to insert a fallback character.

The #3, FormUrlEncodedMediaTypeFormatter is not using the SupportedEncodings property and just handles anything according to the charset specified in the Content-Type header, and it handled the decoding correctly in my tests.

You can enabling tracing for Web API to see whether it would provide some hints for what is actually going on, especially if there are any exceptions happening while the request is being processed before it hits the api controller.

Another possible reason for your problem could be that the client does not handle the encoding correctly, meaning that the value specified in the Content-Type does not match the actual byte stream of the request. You could check the raw stream by using a network analyzer such as WireShark, or by enabling tracing for System.Net.Sockets.

Upvotes: 3

Related Questions