Reputation: 156
I have simple code for getting response from a vietnamese website: http://vnexpress.net , but there is a small problem. For the first time, it downloads ok, but after that, the content contains unknown symbols like this:�\b\0\0\0\0\0\0�\a`I�%&/m.... What is the problem?
string address = "http://vnexpress.net";
WebClient webClient = new WebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
webClient.Encoding = System.Text.Encoding.UTF8;
return webClient.DownloadString(address);
Upvotes: 3
Views: 2997
Reputation: 133975
You'll find that the response is GZipped. There doesn't appear to be a way to download that with WebClient
, unless you create a derived class and modify the underlying HttpWebRequest
to allow automatic decompression.
Here's how you'd do that:
public class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
var req = base.GetWebRequest(address) as HttpWebRequest;
req.AutomaticDecompression = DecompressionMethods.GZip;
return req;
}
}
And to use it:
string address = "http://vnexpress.net";
MyWebClient webClient = new MyWebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
webClient.Encoding = System.Text.Encoding.UTF8;
return webClient.DownloadString(address);
Upvotes: 9
Reputation: 57075
DownloadString requires that the server correctly indicate the charset in the Content-Type response header. If you watch in Fiddler, you'll see that the server instead sends the charset inside a META Tag in the HTML response body:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If you need to handle responses like this, you need to either parse the HTML yourself or use a library like FiddlerCore to do this for you.
Upvotes: 0
Reputation: 5037
try with code and you'll be fine:
string address = "http://vnexpress.net";
WebClient webClient = new WebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
return Encoding.UTF8.GetString(Encoding.Default.GetBytes(webClient.DownloadString(address)));
Upvotes: 1