e-MEE
e-MEE

Reputation: 508

HttpWebRequest an Unicode characters

I am using this code:

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
string result = null;
using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{
   StreamReader reader = new StreamReader(resp.GetResponseStream());
   result = reader.ReadToEnd();
   reader.Close();
}

In result I get text like 003cbr /003e003cbr /003e (I think this should be 2 line breaks instead). I tried with the 2, 3 parameter versions of Streamreader but the string was the same. (the request returns a json string)

Why am I getting those characters, and how can I avoid them?

Upvotes: 2

Views: 3075

Answers (1)

Jon Skeet
Jon Skeet

Reputation: 1503479

It's not really clear what that text is, but you're not specifying an encoding at the moment. What content encoding is the server using? StreamReader will default to UTF-8.

It sounds like actually you're getting some sort of oddly-encoded HTML, as U+003C is < and U+003E is >, giving <br /><br /> as the content. That's not JSON...

Two tests:

  • Use WebClient.DownloadString, which will detect the right encoding to use
  • See what gets shown using the same URL in a browser

EDIT: Okay, now that I've seen the text, it's actually got:

\u003cbr /\u003e

The \u part is important here - that's part of the JSON which states that the next four characters form ar the hex representation of a UTF-16 code unit.

Any JSON API used to parse that text should perform the unescaping for you.

Upvotes: 3

Related Questions