HttpWebRequest an Unicode characters

Question

I am using this code:

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
string result = null;
using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{
   StreamReader reader = new StreamReader(resp.GetResponseStream());
   result = reader.ReadToEnd();
   reader.Close();
}

In result I get text like 003cbr /003e003cbr /003e (I think this should be 2 line breaks instead). I tried with the 2, 3 parameter versions of Streamreader but the string was the same. (the request returns a json string)

Why am I getting those characters, and how can I avoid them?

Jon Skeet · Accepted Answer

It's not really clear what that text is, but you're not specifying an encoding at the moment. What content encoding is the server using? StreamReader will default to UTF-8.

It sounds like actually you're getting some sort of oddly-encoded HTML, as U+003C is < and U+003E is >, giving as the content. That's not JSON...

Two tests:

Use WebClient.DownloadString, which will detect the right encoding to use
See what gets shown using the same URL in a browser

EDIT: Okay, now that I've seen the text, it's actually got:

\u003cbr /\u003e

The \u part is important here - that's part of the JSON which states that the next four characters form ar the hex representation of a UTF-16 code unit.

Any JSON API used to parse that text should perform the unescaping for you.

HttpWebRequest an Unicode characters

Answers (1)

Related Questions