Reputation: 2486
I am trying to create a application that multi threaded downloads images from a website, as a introduction into threading. (never used threading properly before)
But currently it seems to create 1000+ threads and I am not sure where they are coming from.
I first queue a thread into a thread pool, for starters i only have 1 job in the jobs array
foreach (Job j in Jobs)
{
ThreadPool.QueueUserWorkItem(Download, j);
}
Which starts the void Download(object obj)
on a new thread where it loops through a certain amount of pages (images needed / 42 images per page)
for (var i = 0; i < pages; i++)
{
var downloadLink = new System.Uri("http://www." + j.Provider.ToString() + "/index.php?page=post&s=list&tags=" + j.Tags + "&pid=" + i * 42);
using (var wc = new WebClient())
{
try
{
wc.DownloadStringAsync(downloadLink);
wc.DownloadStringCompleted += (sender, e) =>
{
response = e.Result;
ProcessPage(response, false, j);
};
}
catch (System.Exception e)
{
// Unity editor equivalent of console.writeline
Debug.Log(e);
}
}
}
correct me if I am wrong, the next void gets called on the same thread
void ProcessPage(string response, bool secondPass, Job j)
{
var wc = new WebClient();
LinkItem[] linkResponse = LinkFinder.Find(response).ToArray();
foreach (LinkItem i in linkResponse)
{
if (secondPass)
{
if (string.IsNullOrEmpty(i.Href))
continue;
else if (i.Href.Contains("http://loreipsum."))
{
if (DownloadImage(i.Href, ID(i.Href)))
j.Downloaded++;
}
}
else
{
if (i.Href.Contains(";id="))
{
var alterResponse = wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
ProcessPage(alterResponse, true, j);
}
}
}
}
And finally passes on to the last function and downloads the actual image
bool DownloadImage(string target, int id)
{
var url = new System.Uri(target);
var fi = new System.IO.FileInfo(url.AbsolutePath);
var ext = fi.Extension;
if (!string.IsNullOrEmpty(ext))
{
using (var wc = new WebClient())
{
try
{
wc.DownloadFileAsync(url, id + ext);
return true;
}
catch(System.Exception e)
{
if (DEBUG) Debug.Log(e);
}
}
}
else
{
Debug.Log("Returned Without a extension: " + url + " || " + fi.FullName);
return false;
}
return true;
}
I am not sure how I am starting this many threads, but would love to know.
Edit
The goal of this program is to download the different job in jobs at the same time (max of 5) each downloading a maximum of 42 images at the time.
so a maximum of 210 images can/should be downloaded maximum at all times.
Upvotes: 4
Views: 226
Reputation: 28355
First of all, how did you measure the thread count? Why do you think that you have thousand of them in your application? You are using the ThreadPool
, so you don't create them by yourself, and the ThreadPool
wouldn't create such great amount of them for it's needs.
Second, you are mixing synchronious and asynchronious operations in your code. As you can't use TPL
and async/await
, let's go through you code and count the unit-of-works
you are creating, so you can minimize them. After you do this, the number of queued items in ThreadPool
will decrease and your application will gain performance you need.
You don't set the SetMaxThreads
method in your application, so, according the MSDN:
Maximum Number of Thread Pool Threads
The number of operations that can be queued to the thread pool is limited only by available memory; however, the thread pool limits the number of threads that can be active in the process simultaneously. By default, the limit is 25 worker threads per CPU and 1,000 I/O completion threads.
So you must set the maximum to the 5
.
I can't find a place in your code where you check the 42
images per Job, you are only incrementing the value in ProcessPage
method.
ManagedThreadId
for the handle of WebClient.DownloadStringCompleted
- does it execute in different thread or not.You are adding the new item in ThreadPool
queue, why are you using the asynchronious operation for Downloading? Use a synchronious overload, like this:
ProcessPage(wc.DownloadString(downloadLink), false, j);
This will not create another one item in ThreadPool
queue, and you wouldn't have a sinchronisation context switch here.
In ProcessPage
your wc
variable doesn't being garbage collected, so you aren't freeing all your resourses here. Add using
statement here:
void ProcessPage(string response, bool secondPass, Job j)
{
using (var wc = new WebClient())
{
LinkItem[] linkResponse = LinkFinder.Find(response).ToArray();
foreach (LinkItem i in linkResponse)
{
if (secondPass)
{
if (string.IsNullOrEmpty(i.Href))
continue;
else if (i.Href.Contains("http://loreipsum."))
{
if (DownloadImage(i.Href, ID(i.Href)))
j.Downloaded++;
}
}
else
{
if (i.Href.Contains(";id="))
{
var alterResponse = wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
ProcessPage(alterResponse, true, j);
}
}
}
}
}
In DownloadImage
method you also use the asynchronious load. This also adds item in ThreadPoll
queue, and I think that you can avoid this, and use synchronious overload too:
wc.DownloadFile(url, id + ext);
return true;
So, in general, avoid the context-switching operations and dispose your resources properly.
Upvotes: 2
Reputation: 1422
Your wc WebClinet will go out of scope and be randomly garbage collected before the async callback. Also on all async calls you have to allow for immediate return and the actual delegated function return. So processPage will have to be in two places. Also the j in the original loop may be going out of scope depending on where Download in the original loop is declared.
Upvotes: 0