Jacobo Polavieja
Jacobo Polavieja

Reputation: 786

Can't get HTML code through HttpWebRequest

I am trying to parse the HTML code of the page at http://odds.bestbetting.com/horse-racing/today in order to have a list of races, etc. The problem is I am not being able to retrieve the HTML code of the page. Here is the C# code of the function:

    public static string Http(string url) {          
            Uri myUri = new Uri(url);
            // Create a 'HttpWebRequest' object for the specified url. 
            HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(myUri);
            myHttpWebRequest.AllowAutoRedirect = true;
            // Send the request and wait for response.
            HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
            var stream = myHttpWebResponse.GetResponseStream();
            var reader = new StreamReader(stream);
            var html = reader.ReadToEnd();
            // Release resources of response object.
            myHttpWebResponse.Close();

            return html;
    }

When I execute the program calling the function it throws an exception on

HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();

which is:

Cannot handle redirect from HTTP/HTTPS protocols to other dissimilar ones.

I have read this question but I don't seem to have the same problem. I've also tried iguring something out sniffing the traffic with fiddler but can't see anything to where it redirects or something similar. I just have extracted these two possible redirections: odds.bestbetting.com/horse-racing/2011-06-10/byCourse and odds.bestbetting.com/horse-racing/2011-06-10/byTime , but querying them produces the same result as above.

It's not the first time I do something like this, but I'm really lost on this one. Any help?

Thanks!

Upvotes: 3

Views: 10209

Answers (2)

Jacobo Polavieja
Jacobo Polavieja

Reputation: 786

I finally found the solution... it effectively was a problem with the headers, specifically the User-Agent one.

I found after lots of searching a guy having the same problem as me with the same site. Although his code was different the important bit was that he set the UserAgent attribute of the request manually to that of a browser. I think I had done this before but I may had done it pretty bad... sorry.

The final code if it is of interest to any one is this:

    public static string Http(string url) {
        if (url.Length > 0)
        {
            Uri myUri = new Uri(url);
            // Create a 'HttpWebRequest' object for the specified url. 
            HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(myUri);
            // Set the user agent as if we were a web browser
            myHttpWebRequest.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4";

            HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
            var stream = myHttpWebResponse.GetResponseStream();
            var reader = new StreamReader(stream);
            var html = reader.ReadToEnd();
            // Release resources of response object.
            myHttpWebResponse.Close();

            return html;
        }
        else { return "NO URL"; }
    }

Thank you very much for helping.

Upvotes: 3

Paulo Santos
Paulo Santos

Reputation: 11567

There can be a dozen probable causes for your problem.

One of them is that the redirect from the server is pointing to an FTP site, or something like that.

It can also being that the server require some headers in the request that you're failing to provide.

Check what a browser would send to the site and try to replicate.

Upvotes: 1

Related Questions