Reputation: 1875
I have a program which gets html code for ~500 webpages every 5 minutes
it runs correctly until first fail(unable to download source in 6 seconds)
after that all threads will fail
and if I restart program, again it runs correctly until ...
where I'm wrong, what I should do to do it better?
this function runs every 5 mins:
foreach (Company company in companies)
{
string link = company.GetLink();
Thread t = new Thread(() => F(company, link));
t.Start();
if (!t.Join(TimeSpan.FromSeconds(6)))
{
Debug.WriteLine( company.Name + " Fails");
t.Abort();
}
}
and this function download html code
private void F(Company company, string link)
{
try
{
string htmlCode = GetInformationFromWeb.GetHtmlRequest(link);
company.HtmlCode = htmlCode;
}
catch (Exception ex)
{
}
}
and this class:
public class GetInformationFromWeb
{
public static string GetHtmlRequest(string url)
{
using (MyWebClient client = new MyWebClient())
{
client.Encoding = Encoding.UTF8;
string htmlCode = client.DownloadString(url);
return htmlCode;
}
}
}
and web client class
public class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
Upvotes: 0
Views: 77
Reputation: 377
I have four basic suggestions:
HttpClient
instead of obsolete WebClient
. HttpClient
can deal with asynchronous operations natively and has far more flexibility to take advantage of. You can even read downloaded contents to strings/streams on different thread since you can configure await
not to schedule back your operations. Or even program the HttpClientHandler
to break after 6 seconds and raise TaskCanceledException
if this was exceeded.F
function) as it breaks debugging and obfuscates the real cause of problems. Correctly-written program will never raise an exception during normal operation.Task
s (for example, by calling them as Task.Run(async delegate() { await yourTask(); })
(or AsyncContext.Run(...)
if you need UI access) and it won't block anything.GetInformationFromWeb
class is pointless in the moment - and you are spawning multiple client objects also pointlessly, since one HTTP client object can handle multiple requests (if you'd use HttpClient
even without additional bloat - you just instantiate it once as static global variable with all necessary configuration and then call it from any place using as little code as client.GetStringAsync(Uri uri)
.OT: Is it some kind of an academic project?
Upvotes: 1
Reputation: 3543
IF your foreach is looping over 500 companies, and each is creating a new thread, it could be that your internet speed could become a bottleneck and you will receive timeouts over 6 seconds, and fail very often.
I suggest you to try with parallelism. Note MaxDegreeOfParallelism
, which sets maximum amount of parallel executions. You can tune this to suit your needs.
Parallel.ForEach(companies, new ParallelOptions { MaxDegreeOfParallelism = 10 }, (company) =>
{
try
{
string htmlCode = GetInformationFromWeb.GetHtmlRequest(company.link);
company.HtmlCode = htmlCode;
}
catch(Exception ex)
{
//ignore or process exception
}
});
Upvotes: 1