Reputation: 23
I'm trying to get the string by webclient and it has japanese characters but it shows these kind of characters ,�^�p�Ǘ�.
var url= "http://www.itmedia.co.jp/im/articles/0609/14/news117.html";
using (var w = new WebClient())
{
w.Encoding = Encoding.UTF8;
var htmlData= w.DownloadString(url);
}
The value of json_data doesn't show Japanese Characters.
Can you enlighten me why it doesn't convert to Japanese characters even if I encode it to UTF-8?
Upvotes: 1
Views: 2004
Reputation: 23
I changed the code from UTF-8 to shift_jis.
w.Encoding = Encoding.GetEncoding("shift_jis");
Upvotes: 0
Reputation: 82934
According to 3rd line of view-source, it's encoded in shift-jis:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="ja" id="masterChannel-enterprise"><head>
<meta http-equiv="content-type" content="text/html;charset=shift_jis">
Upvotes: 1
Reputation: 156968
If you open the page with Postman, you can see the headers of the response.
As you can see in the picture, the response is compressed with gzip. That is probably causing the scrambled response you see.
WebClient
nowadays supports decompressing gzip automatically, but it wasn't that way always. (If I run your code on .NET 4.6.2 on Windows 10, I do get the right results) It might be you are targeting an older version of the .NET Framework that doesn't support gzip decompression out of the box. The linked post should solve that.
Upvotes: 0