MX D
MX D

Reputation: 2486

Where do these 1k threads come from

I am trying to create a application that multi threaded downloads images from a website, as a introduction into threading. (never used threading properly before)

But currently it seems to create 1000+ threads and I am not sure where they are coming from.

I first queue a thread into a thread pool, for starters i only have 1 job in the jobs array

foreach (Job j in Jobs)
{
    ThreadPool.QueueUserWorkItem(Download, j);
}

Which starts the void Download(object obj) on a new thread where it loops through a certain amount of pages (images needed / 42 images per page)

for (var i = 0; i < pages; i++)
{
    var downloadLink = new System.Uri("http://www." + j.Provider.ToString() + "/index.php?page=post&s=list&tags=" + j.Tags + "&pid=" + i * 42);

    using (var wc = new WebClient())
    {
        try
        {
            wc.DownloadStringAsync(downloadLink);
            wc.DownloadStringCompleted += (sender, e) =>
            {
                response = e.Result;  
                ProcessPage(response, false, j);
            };
        }
        catch (System.Exception e)
        {
            // Unity editor equivalent of console.writeline
            Debug.Log(e);
        }
    }
}

correct me if I am wrong, the next void gets called on the same thread

void ProcessPage(string response, bool secondPass, Job j)
{
    var wc = new WebClient();
    LinkItem[] linkResponse = LinkFinder.Find(response).ToArray();

    foreach (LinkItem i in linkResponse)
    {
        if (secondPass)
        {
            if (string.IsNullOrEmpty(i.Href))
                continue;
            else if (i.Href.Contains("http://loreipsum."))
            {
                if (DownloadImage(i.Href, ID(i.Href)))
                    j.Downloaded++;
            }
        }
        else
        {
            if (i.Href.Contains(";id="))
            {
                var alterResponse = wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
                ProcessPage(alterResponse, true, j);
            }
        }
    }
}

And finally passes on to the last function and downloads the actual image

bool DownloadImage(string target, int id)
{
    var url = new System.Uri(target);
    var fi = new System.IO.FileInfo(url.AbsolutePath);
    var ext = fi.Extension;

    if (!string.IsNullOrEmpty(ext))
    {
        using (var wc = new WebClient())
        {
            try
            {
                wc.DownloadFileAsync(url, id + ext);
                return true;
            }
            catch(System.Exception e)
            {
                if (DEBUG) Debug.Log(e);
            }
        }
    }
    else
    {
        Debug.Log("Returned Without a extension: " + url + " || " + fi.FullName);
        return false;
    }
    return true;
}

I am not sure how I am starting this many threads, but would love to know.

Edit

The goal of this program is to download the different job in jobs at the same time (max of 5) each downloading a maximum of 42 images at the time.

so a maximum of 210 images can/should be downloaded maximum at all times.

Upvotes: 4

Views: 226

Answers (2)

VMAtm
VMAtm

Reputation: 28355

First of all, how did you measure the thread count? Why do you think that you have thousand of them in your application? You are using the ThreadPool, so you don't create them by yourself, and the ThreadPool wouldn't create such great amount of them for it's needs.

Second, you are mixing synchronious and asynchronious operations in your code. As you can't use TPL and async/await, let's go through you code and count the unit-of-works you are creating, so you can minimize them. After you do this, the number of queued items in ThreadPool will decrease and your application will gain performance you need.

  1. You don't set the SetMaxThreads method in your application, so, according the MSDN:

    Maximum Number of Thread Pool Threads
    The number of operations that can be queued to the thread pool is limited only by available memory; however, the thread pool limits the number of threads that can be active in the process simultaneously. By default, the limit is 25 worker threads per CPU and 1,000 I/O completion threads.

    So you must set the maximum to the 5.

  2. I can't find a place in your code where you check the 42 images per Job, you are only incrementing the value in ProcessPage method.

  3. Check the ManagedThreadId for the handle of WebClient.DownloadStringCompleted - does it execute in different thread or not.
  4. You are adding the new item in ThreadPool queue, why are you using the asynchronious operation for Downloading? Use a synchronious overload, like this:

    ProcessPage(wc.DownloadString(downloadLink), false, j);
    

    This will not create another one item in ThreadPool queue, and you wouldn't have a sinchronisation context switch here.

  5. In ProcessPage your wc variable doesn't being garbage collected, so you aren't freeing all your resourses here. Add using statement here:

    void ProcessPage(string response, bool secondPass, Job j)
    {
        using (var wc = new WebClient())
        {
            LinkItem[] linkResponse = LinkFinder.Find(response).ToArray();
    
            foreach (LinkItem i in linkResponse)
            {
                if (secondPass)
                {
                    if (string.IsNullOrEmpty(i.Href))
                        continue;
                    else if (i.Href.Contains("http://loreipsum."))
                    {
                        if (DownloadImage(i.Href, ID(i.Href)))
                            j.Downloaded++;
                    }
                }
                else
                {
                    if (i.Href.Contains(";id="))
                    {
                        var alterResponse = wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
                        ProcessPage(alterResponse, true, j);
                    }
                }
            }
        }
    }
    
  6. In DownloadImage method you also use the asynchronious load. This also adds item in ThreadPoll queue, and I think that you can avoid this, and use synchronious overload too:

    wc.DownloadFile(url, id + ext);
    return true; 
    

So, in general, avoid the context-switching operations and dispose your resources properly.

Upvotes: 2

Sql Surfer
Sql Surfer

Reputation: 1422

Your wc WebClinet will go out of scope and be randomly garbage collected before the async callback. Also on all async calls you have to allow for immediate return and the actual delegated function return. So processPage will have to be in two places. Also the j in the original loop may be going out of scope depending on where Download in the original loop is declared.

Upvotes: 0

Related Questions