Dilshod K
Dilshod K

Reputation: 3032

How to get html content from amazon using HttpWebRequest

I am trying to get HTML content from the amazon website. Here is my code to create request, response, and get string:

       public static HttpWebResponse GetHttpWebResponse(string url)
    {
        HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
        webRequest.ContentType = "text/xml";
        try
        {
            return (HttpWebResponse)webRequest.GetResponse();
        }
        catch (WebException e)
        {
            if (e.Response == null)
                throw new Exception("Cannot get response");
            return (HttpWebResponse)e.Response;
        }
    }

    public static string GetString(HttpWebResponse response)
    {
        Encoding encoding = Encoding.UTF8;
        using (var reader = new StreamReader(response.GetResponseStream(), encoding))
        {
            string responseText = reader.ReadToEnd();
            return responseText;
        }
    }

It is working fine with other web sites. However, when I try to get content from amazon, for example: https://www.amazon.com/gp/product/B00AEISSHA/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 I am seeing encoded content:

Encoded content

I tried to change Encoding and used HttpUtility.HtmlDecode(html); but it couldn't help. Is there any simple way to get content from Amazon?

Upvotes: 0

Views: 592

Answers (1)

Nicholas Bergesen
Nicholas Bergesen

Reputation: 382

You're not catering for compression. If you update your webrequest like this, it should do the trick.

public static HttpWebResponse GetHttpWebResponse(string url)
{
    HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.ContentType = "text/xml";
    webRequest.AutomaticDecompression = DecompressionMethods.GZip;
    try
    {
        return (HttpWebResponse)webRequest.GetResponse();
    }
    catch (WebException e)
    {
        if (e.Response == null)
            throw new Exception("Cannot get response");
        return (HttpWebResponse)e.Response;
    }
}

Upvotes: 4

Related Questions