Dure Sameen
Dure Sameen

Reputation: 133

Extract web site plain html

I am trying to access a website's content using the following code:

HttpClient httpClient = new HttpClient();
string htmlresult = "";

var response = await httpClient.GetAsync(url);

if (response.IsSuccessStatusCode)
{
    htmlresult = await response.Content.ReadAsStringAsync();
}

return htmlresult;

It gives me the right html except for https://www.yahoo.com, which is giving me possibly an encrypted string instead plain html, something like below.

   ‹       Ľç–ãF¶.øÿ<»Ž4Kj“ð¦ÔÒ½÷ž·îÊO0$ Úž~÷   4@D™U:ëNgK"bÛÄïÿõr¯4^ô 

How can I get simple html from this encrypted text?

Upvotes: 0

Views: 76

Answers (1)

Manfred Radlwimmer
Manfred Radlwimmer

Reputation: 13394

Yahoo uses Accept-Encoding: gzip, deflate, br, so the content in your case is g-zipped. Quick fix to your code - Enable automatic decompression:

private async Task<String> GetUrl(string url)
{
    HttpClientHandler handler = new HttpClientHandler()
    {
        AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
    };

    HttpClient httpClient = new HttpClient(handler);

    string htmlresult = "";

    var response = await httpClient.GetAsync(url);

    if (response.IsSuccessStatusCode)
    {
        htmlresult = await response.Content.ReadAsStringAsync();
    }

    return htmlresult;
}

Upvotes: 2

Related Questions