Reputation: 379
I want to run multiple threads at a time simultaneously (max 5 threads, for example) and when either one finishes, the new one starts with different data. (one finishes, one new start, two finishes, two new start...)
Main for loop is in main form, but run from a different thread not to block the UI.
When I run it, program adds 5 web browser controls (as a visual progress) and when the page is done loading it removes loaded ones.
The problem is no more controls is being added to the form.
Maybe semaphore is not released properly to allow new ones to start or am I missing something else?
And if I close the program, it doesn't exit, I think it gets blocked on WaitHandle.WaitOne because there are still more jobs to be done.
I removed some non needed data for more code clarity.
Semaphore pool = new Semaphore(5, 5);
Scraper[] scraper = new Scraper[5];
Gecko.GeckoWebBrowser wb = null;
int j = 0;
for (int i = 0; i < arrScrapeboxItems.Count; i++)
{
pool.WaitOne();
bool pustiMe = true;
while (pustiMe)
{
if (scraper[j] == null) scraper[j] = new Scraper();
if (scraper[j].tred == null)
{
ScrapeBoxItems sbi = (ScrapeBoxItems)arrScrapeboxItems[i];
doneEvents.Add(new ManualResetEvent(false)); // this is for WaitHandle.WaitAll after the for loop is done all the items
wb = new Gecko.GeckoWebBrowser();
PoolObjects po = new PoolObjects();
po.link = sbi.link;
// etc...
scraper[j].ThreadsCompleted += new Scraper.ThreadsHandler(frmMain_NextThreadItemsCompleted);
scraper[j].tred = new Thread(new ParameterizedThreadStart(scraper[j].Scrape));
scraper[j].tred.Start(po);
pustiMe = false;
if (j == maxThreads - 1)
j = 0;
else
j++;
break;
}
else if (scraper[j].tred.IsAlive) // if the thread is finished, make room for new thread
{
scraper[j] = null;
}
if (pustiMe) Thread.Sleep(1000);
}
}
// event from Scraper class
void frmMain_ThreadsCompleted()
{
pool.Release();
}
And the Scraper class look like:
public void Scrape(object o)
{
po = (PoolObjects)o;
// do stuff with po
po.form.Invoke((MethodInvoker)delegate
{
po.form.Controls.Add(po.wb);
po.wb.DocumentCompleted += new EventHandler<Gecko.Events.GeckoDocumentCompletedEventArgs>(wb_DocumentCompleted);
po.wb.Navigate(po.link);
});
}
void wb_DocumentCompleted(object sender, Gecko.Events.GeckoDocumentCompletedEventArgs e)
{
var br = sender as Gecko.GeckoWebBrowser;
if (br.Url == e.Uri)
{
form.Controls.Remove(po.wb);
ThreadsCompleted();
manualReset.Set();
}
}
Upvotes: 0
Views: 1252
Reputation: 134105
Either you have a typo or a huge bug. You have
else if (scraper[j].tred.IsAlive)
{
scraper[j] = null;
}
I think you want if (!scraper[j].tred.IsAlive)
. Otherwise, you'll end up overwriting an active Scraper
reference in the array.
More to the point, trying to maintain that array of Scraper
objects is causing you a lot of complication that you really don't need. You already have the semaphore controlling how many concurrent threads you can have, so the array of Scraper
objects is unnecessary noise.
Also, you don't want a whole bunch of ManualResetEvent
objects to wait on. WaitAll
can't wait on more than 63 items, so if you have more than that in your items list, WaitAll
isn't going to do it for you. I show below a better way to make sure all the jobs are completed.
for (int i = 0; i < arrScrapeboxItems.Count; i++)
{
pool.WaitOne();
ScrapeBoxItems sbi = (ScrapeBoxItems)arrScrapeboxItems[i];
wb = new Gecko.GeckoWebBrowser();
PoolObjects po = new PoolObjects();
po.link = sbi.link;
// more initialization of po ...
// and then start the thread
Thread t = new Thread(ScrapeThreadProc);
t.Start(po);
}
// Here's how you wait for all of the threads to complete.
// You have your main thread (which is running here) call `WaitOne` on the semaphore 5 times:
for (int i = 0; i < 5; ++i)
{
pool.WaitOne();
}
private void ScrapeThreadProc(object o)
{
var po = (PoolObjects)o;
Scraper scraper = new Scraper();
// initialize your Scraper object
scraper.ThreadsCompleted += new Scraper.ThreadsHandler(frmMain_NextThreadItemsCompleted);
scraper.Scrape(po);
// scraping is done. Dispose of the scraper and the po.
// and then release the semaphore
pool.Release();
}
That should greatly simplify your code.
The idea behind having the main thread wait on the semaphore 5 times is pretty simple. If the main thread can acquire the semaphore 5 times without calling Release
, then you know that there aren't any other jobs running.
There are other ways to do this, as well, but they would require some more involved restructuring of your code. You should look into the Task Parallel Library, specifically Parallel.ForEach, which will handle the threading for you. You can set the maximum number of concurrent threads to 5, so that you won't get too many threads going at once.
You could also do this using a producer/consumer setup with BlockingCollection
or some other shared queue.
In both of those scenarios, you end up creating 5 persistent threads that cooperatively process items from the list. That is typically more efficient than creating one thread for each item.
Upvotes: 2