Reputation: 1833
I was trying to scrape some data from a site. Here is my class:
class ClosureCraziness
{
public string SaveFolder { get; set; }
public void Save(Dictionary<string, string> idToWebLocation)
{
var tasks = new List<Task>();
foreach (var kvp in idToWebLocation)
{
var task = new Task(() => Download(kvp.Key, kvp.Value));
task.Start();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
void Download(string id, string location)
{
var filename = $"{id}.html";
string source = string.Empty;
try
{
source = GetSource(location);
}
catch (Exception e)
{
// handle exception
}
var path = Path.Combine(SaveFolder, filename);
using (var sw = new StreamWriter(path))
sw.Write(source);
}
string GetSource(string location)
{
using (var client = new WebClient())
{
return client.DownloadString(location);
}
}
}
When I executed, I would wind up with something like the following. You'll notice that the contents of the file (the source that was downloaded) does not match the name:
Filename on Disk | File Contents
apple.html <html> apple </html>
orange.html <html> orange </html>
pear.html <html> peach </html>
peach.html <html> peach </html>
grape.html <html> apple </html>
plum.html <html> plum </html>
(I can't figure out how to format this nicely)
At first I was baffled since the file name on the disk was correct, and I was sure my Dictionary<string, string>
was properly formed (I checked 6 times, all different ways), meaning the association of Id to web location was good.
I thought maybe it was a closure issue, having recalled Eric Lippert schooling me on the implementation of foreach. So I tried:
public void Save(Dictionary<string, string> idToWebLocation)
{
var tasks = new List<Task>();
foreach (var kvp in idToWebLocation)
{
var innerKvp = kvp;
var task = new Task(() => Download(innerKvp.Key, innerKvp.Value));
task.Start();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
And, to be safe:
public void Save(Dictionary<string, string> idToWebLocation)
{
var tasks = new List<Task>();
foreach (var kvp in idToWebLocation)
{
var innerKvp = kvp;
var id = innerKvp.Key;
var loc = innerKvp.Value;
var task = new Task(() => Download(id, loc));
task.Start();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
Also, because who knows:
public void Save(Dictionary<string, string> idToWebLocation)
{
var tasks = new List<Task>();
foreach (var kvp in idToWebLocation)
{
var innerKvp = kvp;
var task = new Task(() =>
{
var id = innerKvp.Key;
var loc = innerKvp.Value;
Download(id, loc);
});
task.Start();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
But neither of those worked. Clearly my understanding of how this code gets compiled is lacking, but I mean, what the hell is going on.
It seems like somewhere between var filename = $"{id}.html";
and source = GetSource(location);
that location
is changing. I'm pretty sure the code is thread safe, there is no shared state, right?
But obviously it's not, because when I iterate through the dictionary synchronously everything works exactly as expected.
Maybe I'm missing some fundamental point here, regarding enclosures or threading or memory or whatever. I don't know, but my desk is covered in hair and I'm approaching baldness.
Upvotes: 3
Views: 917
Reputation: 2028
Do you still get the problem if you change the code to use async/await instead of constructing the tasks yourself?
class ClosureCraziness
{
public string SaveFolder { get; set; }
public void Save(Dictionary<string, string> idToWebLocation)
{
var tasks = new List<Task>();
foreach (var kvp in idToWebLocation)
{
tasks.Add(Download(kvp.Key, kvp.Value));
}
Task.WaitAll(tasks.ToArray());
}
async Task Download(string id, string location)
{
var filename = $"{id}.html";
string source = string.Empty;
try
{
source = await GetSource(location);
}
catch (Exception e)
{
filename = "e-" + filename;
var ex = e;
while (ex != null)
{
source += ex.Message;
source += Environment.NewLine;
source += Environment.NewLine;
source += ex.StackTrace;
ex = ex.InnerException;
}
}
var path = Path.Combine(SaveFolder, filename);
using (var sw = new StreamWriter(path))
await sw.WriteAsync(source);
}
async Task<String> GetSource(string location)
{
using (var client = new WebClient())
{
return await client.DownloadStringTaskAsync(location);
}
}
}
The only things I've changed from your original are to use the Task
returning versions of Write
and DownloadString
, change your helper methods to return Task
s, and pepper in some async
s and await
s to make the code compile. I don't have a compiler handy, but this should be pretty close to correct.
I don't actually see the problem with your original, but by encapsulating the task creation in a function that takes its inputs as parameters, we should be able to minimize the possibility of closure-related bugs creeping in.
Upvotes: 0
Reputation: 1249
The task parallel library has a for each method designed very much for the kind of thing you're doing. You might find that interesting/relevant to what you're currently trying to do:
https://msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx
Upvotes: 1
Reputation: 13
I think you should try to create local variables of key and value before creating a task.
var key = kvp.Key;
var value = kvp.Value;
var task = new Task(() => Download(key, value));
Upvotes: 0