Michael
Michael

Reputation: 1833

Creating and Starting Tasks inside a Foreach

I was trying to scrape some data from a site. Here is my class:

class ClosureCraziness
{
    public string SaveFolder { get; set; }

    public void Save(Dictionary<string, string> idToWebLocation)
    {
        var tasks = new List<Task>();
        foreach (var kvp in idToWebLocation)
        {
            var task = new Task(() => Download(kvp.Key, kvp.Value));
            task.Start();
            tasks.Add(task);
        }

        Task.WaitAll(tasks.ToArray());
    }

    void Download(string id, string location)
    {
        var filename = $"{id}.html";
        string source = string.Empty;
        try
        {
            source = GetSource(location);
        }
        catch (Exception e)
        {
            // handle exception
        }

        var path = Path.Combine(SaveFolder, filename);
        using (var sw = new StreamWriter(path))
            sw.Write(source);
    }

    string GetSource(string location)
    {
        using (var client = new WebClient())
        {
            return client.DownloadString(location);
        }
    }
}

When I executed, I would wind up with something like the following. You'll notice that the contents of the file (the source that was downloaded) does not match the name:

Filename on Disk | File Contents

apple.html <html> apple </html>

orange.html <html> orange </html>

pear.html <html> peach </html>

peach.html <html> peach </html>

grape.html <html> apple </html>

plum.html <html> plum </html>

(I can't figure out how to format this nicely)

At first I was baffled since the file name on the disk was correct, and I was sure my Dictionary<string, string> was properly formed (I checked 6 times, all different ways), meaning the association of Id to web location was good.

I thought maybe it was a closure issue, having recalled Eric Lippert schooling me on the implementation of foreach. So I tried:

public void Save(Dictionary<string, string> idToWebLocation)
{
    var tasks = new List<Task>();
    foreach (var kvp in idToWebLocation)
    {
        var innerKvp = kvp;
        var task = new Task(() => Download(innerKvp.Key, innerKvp.Value));
        task.Start();
        tasks.Add(task);
    }

    Task.WaitAll(tasks.ToArray());
}

And, to be safe:

public void Save(Dictionary<string, string> idToWebLocation)
    {
        var tasks = new List<Task>();
        foreach (var kvp in idToWebLocation)
        {
            var innerKvp = kvp;
            var id = innerKvp.Key;
            var loc = innerKvp.Value;
            var task = new Task(() => Download(id, loc));
            task.Start();
            tasks.Add(task);
        }

        Task.WaitAll(tasks.ToArray());
    }

Also, because who knows:

public void Save(Dictionary<string, string> idToWebLocation)
{
    var tasks = new List<Task>();
    foreach (var kvp in idToWebLocation)
    {
        var innerKvp = kvp;
        var task = new Task(() =>
        {
            var id = innerKvp.Key;
            var loc = innerKvp.Value;
            Download(id, loc);
        });

        task.Start();
        tasks.Add(task);
    }

    Task.WaitAll(tasks.ToArray());
}

But neither of those worked. Clearly my understanding of how this code gets compiled is lacking, but I mean, what the hell is going on.

It seems like somewhere between var filename = $"{id}.html"; and source = GetSource(location); that location is changing. I'm pretty sure the code is thread safe, there is no shared state, right?

But obviously it's not, because when I iterate through the dictionary synchronously everything works exactly as expected.

Maybe I'm missing some fundamental point here, regarding enclosures or threading or memory or whatever. I don't know, but my desk is covered in hair and I'm approaching baldness.

Upvotes: 3

Views: 917

Answers (3)

Tim Destan
Tim Destan

Reputation: 2028

Do you still get the problem if you change the code to use async/await instead of constructing the tasks yourself?

class ClosureCraziness
{
    public string SaveFolder { get; set; }

    public void Save(Dictionary<string, string> idToWebLocation)
    {
        var tasks = new List<Task>();
        foreach (var kvp in idToWebLocation)
        {
            tasks.Add(Download(kvp.Key, kvp.Value));
        }

        Task.WaitAll(tasks.ToArray());
    }

    async Task Download(string id, string location)
    {
        var filename = $"{id}.html";
        string source = string.Empty;
        try
        {
            source = await GetSource(location);
        }
        catch (Exception e)
        {
            filename = "e-" + filename;
            var ex = e;
            while (ex != null)
            {
                source += ex.Message;
                source += Environment.NewLine;
                source += Environment.NewLine;
                source += ex.StackTrace;
                ex = ex.InnerException;
            }
        }

        var path = Path.Combine(SaveFolder, filename);
        using (var sw = new StreamWriter(path))
            await sw.WriteAsync(source);
    }

    async Task<String> GetSource(string location)
    {
        using (var client = new WebClient())
        {
            return await client.DownloadStringTaskAsync(location);
        }
    }
}

The only things I've changed from your original are to use the Task returning versions of Write and DownloadString, change your helper methods to return Tasks, and pepper in some asyncs and awaits to make the code compile. I don't have a compiler handy, but this should be pretty close to correct.

I don't actually see the problem with your original, but by encapsulating the task creation in a function that takes its inputs as parameters, we should be able to minimize the possibility of closure-related bugs creeping in.

Upvotes: 0

MutantNinjaCodeMonkey
MutantNinjaCodeMonkey

Reputation: 1249

The task parallel library has a for each method designed very much for the kind of thing you're doing. You might find that interesting/relevant to what you're currently trying to do:

https://msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx

Upvotes: 1

Andrew Lobanov
Andrew Lobanov

Reputation: 13

I think you should try to create local variables of key and value before creating a task.

var key = kvp.Key;
var value = kvp.Value;
var task = new Task(() => Download(key, value));

Upvotes: 0

Related Questions