Reputation: 548
I try do download a web page using the WebClient, but it hangs until the timeout in WebClient is reached, and then fails with an Exception.
The following code will not work
WebClient client = new WebClient();
string url = "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
string page = client.DownloadString(url);
Using a different URL, the transfer works fine. For example
WebClient client = new WebClient();
string url = "https://www.ariva.de/apple-aktie";
string page = client.DownloadString(url);
completes very quick and has the whole html in the page variable.
Using a HttpClient or WebRequest/WebResponse gives the same result on the first URL: block until timeout exception.
Both URLs load fine in a browser, in roughly 2-5 seconds. Any idea what the problem is, and what solution is available?
I noticed that when using a WebBrowser control on a Windows Forms dialog, the first URL loads with 20+ javascript errors that need to be confirm-clicked. Same can be observed when developer tools are open in a browser when accessing the first URL.
However, WebClient does NOT act on the return it gets. It does not run the javascript, and does not load referenced pictures, css or other scripts, so this should not be a problem.
Thanks!
Ralf
Upvotes: 1
Views: 2710
Reputation: 32248
The first site, "https://www.nasdaq.com/de/symbol/aapl/dividend-history";
, requires:
= SecurityProtocolType.Tls12
The User-agent
here is important. If a recent User-agent
is specified in the WebRequest.UserAgent, the WebSite may activate the Http 2.0
protocol and HSTS
(HTTP Strict Transport Security). These are supported/understood only by recent Browsers (as a reference, FireFox 56 or newer).
Using a less recent Browser as User-agent
is necessary, otherwise the WebSite will expect (and wait for) a dynamic response. Using an older User-agent
, the WebSite will activate the Http 1.1
protocol and never HSTS.
The second site, "https://www.ariva.de/apple-aktie";
, requires:
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
I suggest to setup a WebRequest (or a corresponding HttpClient setup) this way:
(WebClient could work, but it'd probably require a derived Custom Control)
private async void button1_Click(object sender, EventArgs e)
{
button1.Enabled = false;
Uri uri = new Uri("https://www.nasdaq.com/de/symbol/aapl/dividend-history");
string destinationFile = "[Some Local File]";
await HTTPDownload(uri, destinationFile);
button1.Enabled = true;
}
CookieContainer httpCookieJar = new CookieContainer();
//The 32bit IE11 header is the User-agent used here
public async Task HTTPDownload(Uri resourceURI, string filePath)
{
// Windows 7 may require to explicitly set the Protocol
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
// Only blindly accept the Server certificates if you know and trust the source
ServicePointManager.ServerCertificateValidationCallback += (s, cert, ch, sec) => { return true; };
ServicePointManager.DefaultConnectionLimit = 50;
var httpRequest = WebRequest.CreateHttp(resourceURI);
try
{
httpRequest.CookieContainer = httpCookieJar;
httpRequest.Timeout = (int)TimeSpan.FromSeconds(15).TotalMilliseconds;
httpRequest.AllowAutoRedirect = true;
httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
httpRequest.ServicePoint.Expect100Continue = false;
httpRequest.UserAgent = "Mozilla / 5.0(Windows NT 6.1; WOW32; Trident / 7.0; rv: 11.0) like Gecko";
httpRequest.Accept = "ext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
httpRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate;q=0.8");
httpRequest.Headers.Add(HttpRequestHeader.CacheControl, "no-cache");
using (var httpResponse = (HttpWebResponse)await httpRequest.GetResponseAsync())
using (var responseStream = httpResponse.GetResponseStream())
{
if (httpResponse.StatusCode == HttpStatusCode.OK) {
try {
int buffersize = 132072;
using (var fileStream = File.Create(filePath, buffersize, FileOptions.Asynchronous)) {
int read;
byte[] buffer = new byte[buffersize];
while ((read = await responseStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
await fileStream.WriteAsync(buffer, 0, read);
}
};
}
catch (DirectoryNotFoundException) { /* Log or throw */}
catch (PathTooLongException) { /* Log or throw */}
catch (IOException) { /* Log or throw */}
}
};
}
catch (WebException) { /* Log and message */}
catch (Exception) { /* Log and message */}
}
The first WebSite (nasdaq.com
) returned payload length is 101.562
bytes
The second WebSite (www.ariva.de
) returned payload length is 56.919
bytes
Upvotes: 4
Reputation: 34160
Obviously there is a problem with downloading that link (incorrect url, unothorized access, ...), however you may use Async Method to solve the socking part:
WebClient client = new WebClient();
client.DownloadStringCompleted += (s, e) =>
{
//here deal with downloaded file
};
client.DownloadStringAsync(url);
Upvotes: 0