Reputation: 3440
When I calling site www.livescore.com by HttpClient class I always getting error "500". Probably server blocked request from HttpClients.
1)There is any other method to get html from webpage?
2)How I can set the headers to get html content?
When I set headers like in browser I always get stange encoded content.
http_client.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
http_client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
http_client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
http_client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");
3) How I can slove this problem? Any suggestions?
I using Windows 8 Metro Style App in C# and HttpClientClass
Upvotes: 40
Views: 57448
Reputation: 20157
Here you go - note you have to decompress the gzip encoded-result you get back as per mleroy:
private static readonly HttpClient _HttpClient = new();
private static async Task<string> GetResponse(string url, CancellationToken token = default)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
request.Headers.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");
using var response = await _HttpClient.SendAsync(request, token).ConfigureAwait(false);
response.EnsureSuccessStatusCode();
await using var responseStream = await response.Content.ReadAsStreamAsync(token).ConfigureAwait(false);
await using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync(token).ConfigureAwait(false);
}
call such like:
var response = await GetResponse("http://www.livescore.com/").ConfigureAwait(false); // or var response = GetResponse("http://www.livescore.com/").Result;
Upvotes: 73
Reputation: 4729
Could try this as well to add compression support:
var compressclient = new HttpClient(new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip
});
This adds the headers too.
According to the same thread support is now in Windows Store framework: http://social.msdn.microsoft.com/Forums/windowsapps/en-US/429bb65c-5f6b-42e0-840b-1f1ea3626a42/httpclient-data-compression-and-caching?prof=required
Upvotes: 26
Reputation: 3162
Several things to take note of.
That site requires you to provide a user agent, or it returns a 500 HTTP error.
A GET request to livescore.com responds with a 302 to livescore.us. You need to handle the redirection or directly request livescore.us
This code works using the .NET 4 Client Profile, I'll let you figure out if it fits a Windows Store app.
var request = (HttpWebRequest)HttpWebRequest.Create("http://www.livescore.com");
request.AllowAutoRedirect = true;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17";
string content;
using (var response = (HttpWebResponse)request.GetResponse())
using (var decompressedStream = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))
using (var streamReader = new StreamReader(decompressedStream))
{
content = streamReader.ReadToEnd();
}
Upvotes: 5
Reputation: 738
I think you can be pretty certain that they have done everything to stop developers from screen-scraping.
If I try from a standard C# project using this code :
var request = WebRequest.Create("http://www.livescore.com ");
var response = request.GetResponse();
I get this response:
The remote server returned an error: (403) Forbidden.
Upvotes: 1