Reputation: 9
I know why my exception is caused but cannot find a better way to fix this. I tried everything so far but I don't get the same results I want without changing the method. The method Scrape(string link, Regex expression, Webclient webClient) returns a string list. This code works fine without multithreading, but the process of crawling is really slow on 1 thread. My goal is to have at least 15 threads running. (I tried increasing the stacksize as well)
private void Crawl(List<String> links)
{
List<String> scrapedLinks = new List<String>();
foreach (string link in links)
{
List<String> scrapedItems = Scrape(link, new Regex(iTalk_TextBox_Small2.Text), new WebClient());
foreach (string item in scrapedItems) listBox1.Invoke(new Action(delegate () { listBox1.Items.Add(item); }));
iTalk_Label4.Invoke(new Action(delegate () { iTalk_Label4.Text = "Scraped Items: " + listBox1.Items.Count; }));
if (scrapedItems.Count > 0 || !Properties.Settings.Default.Inspector)
{
foreach (string scrapedLink in Scrape(link, new Regex(@"https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)"), new WebClient()))
{
if(!Properties.Settings.Default.Blacklist.Contains(scrapedLink)) scrapedLinks.Add(scrapedLink);
}
scrapedLinksTotal += scrapedLinks.Count;
}
iTalk_Label5.Invoke(new Action(delegate () { iTalk_Label5.Text = "Scraped Links: " + scrapedLinksTotal; }));
}
Crawl(scrapedLinks);
}
Upvotes: 0
Views: 354
Reputation: 1403
Add a terminal condition. Without delving into the logic of what Crawl actually does, perhaps something as simple as this will fix the problem:
private void Crawl(List<String> links)
{
//////////////////////////////////
// Check for something to work on
if (links == null || links.Count == 0)
return; // Return if there is nothing to do.
//////////////////////////////////
List<String> scrapedLinks = new List<String>();
foreach (string link in links)
{
List<String> scrapedItems = Scrape(link, new Regex(iTalk_TextBox_Small2.Text), new WebClient());
foreach (string item in scrapedItems) listBox1.Invoke(new Action(delegate () { listBox1.Items.Add(item); }));
iTalk_Label4.Invoke(new Action(delegate () { iTalk_Label4.Text = "Scraped Items: " + listBox1.Items.Count; }));
if (scrapedItems.Count > 0 || !Properties.Settings.Default.Inspector)
{
foreach (string scrapedLink in Scrape(link, new Regex(@"https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)"), new WebClient()))
{
if(!Properties.Settings.Default.Blacklist.Contains(scrapedLink)) scrapedLinks.Add(scrapedLink);
}
scrapedLinksTotal += scrapedLinks.Count;
}
iTalk_Label5.Invoke(new Action(delegate () { iTalk_Label5.Text = "Scraped Links: " + scrapedLinksTotal; }));
}
Crawl(scrapedLinks);
}
Upvotes: 1
Reputation: 2901
Stack overflow is caused by infinite recursion in 99% of cases. In your case you are calling Crawl(scrapedLinks) unconditionally inside Crawl. Don't know what scrapedLinks should do, but this is the reason.
Upvotes: 2