Reputation: 413
I've been working on refactoring a process that iterates over a collection of FileClass
objects that have a Filename
, NewFilename
, and a string[] FileReferences
property, and replaces all FileReferences
in which reference the old filename with the new one. The code below is slightly simplified, in that the real file references property is not just a list of the file names- they are lines which may contain a file name somewhere in them, or not. The current code is ok when the _fileClass
collection is below around 1000 objects... but is painfully slow if there are more objects, or the file references property has thousands.
Following the answers on this post: Run two async tasks in parallel and collect results in .NET 4.5 (and several like it). I've been trying to make a async method which would take a list of all the old and new file names as well as an individual FileClass
, then build an array of these Task<FileClass>
and try to process them in parallel via Task.WhenAll()
. But running into a "Cannot await void" error. I believe it's due to the Task.Run(() => ...);
but removing the () =>
causes further problems.
It's an older code base, and I can't let the async propagate further than the calling code (in this case, Main
, as I have found in some other examples. Nor can I use C#8's async foreach due to a .Net 4.5 limitation.
class Program
{
private static List<FileClass> _fileClasses;
static void Main(string[] args)
{
var watch = new Stopwatch();
_fileClasses = GetFileClasses();
watch.Start();
ReplaceFileNamesAsync();
watch.Stop();
Console.WriteLine($"Async Elapsed Ticks: {watch.ElapsedTicks}");
watch.Reset();
//watch.Start();
//ReplaceFileNamesSLOW();
//watch.Stop();
//Console.WriteLine($"Slow Elapsed Ticks: {watch.ElapsedTicks}");
Console.ReadLine();
}
public static async void ReplaceFileNamesAsync()
{
var newOldFilePairs = _fileClasses.Select(p => new NewOldFilePair() { OldFile = p.Filename, NewFile = p.NewFilename }).ToArray();
var tasks = new List<Task<FileClass>>();
foreach (var file in _fileClasses)
{
tasks.Add(ReplaceFileNamesAsync(newOldFilePairs, file));
}
//Red underline "Cannot await void".
FileClass[] result = await Task.WaitAll(tasks.ToArray());
}
private static async Task<FileClass> ReplaceFileNamesAsync(NewOldFilePair[] fastConfigs, FileClass fileClass)
{
foreach (var config in fastConfigs)
{
//I suspect this is part of the issue.
await Task.Run(() => fileClass.ReplaceFileNamesInFileReferences(config.OldFile, config.NewFile));
}
return fileClass;
}
public static void ReplaceFileNamesSLOW()
{
// Current Technique
for (var i = 0; i < _fileClasses.Count; i++)
{
var oldName = _fileClasses[i].Filename;
var newName = _fileClasses[i].NewFilename;
for (var j = 0; j < _fileClasses.Count; j++)
{
_fileClasses[j].ReplaceFileNamesInFileReferences(oldName, newName);
}
}
}
public static List<FileClass> GetFileClasses(int numberToGet = 2000)
{
//helper method to build a bunch of FileClasses
var fileClasses = new List<FileClass>();
for (int i = 0; i < numberToGet; i++)
{
fileClasses.Add(new FileClass()
{
Filename = $@"C:\fake folder\fake file_{i}.ext",
NewFilename = $@"C:\some location\sub folder\fake file_{i}.ext"
});
}
var fileReferences = fileClasses.Select(p => p.Filename).ToArray();
foreach (var fileClass in fileClasses)
{
fileClass.FileReferences = fileReferences;
}
return fileClasses;
}
}
public class NewOldFilePair
{
public string OldFile { get; set; }
public string NewFile { get; set; }
}
public class FileClass
{
public string Filename { get; set; }
public string NewFilename { get; set; }
public string[] FileReferences { get; set; }
//Or this might be the void it doesn't like.
public void ReplaceFileNamesInFileReferences(string oldName, string newName)
{
if (FileReferences == null) return;
if (FileReferences.Length == 0) return;
for (var i = 0; i < FileReferences.Length; i++)
{
if (FileReferences[i] == oldName) FileReferences[i] = newName;
}
}
}
Update
If others find this question and actually need to implement something similar to above, there were some potential pitfalls worth mentioning. Obviously, I had a typo for Task.WaitAll()
vs Task.WhenAll()
(I blame VS autocomplete, and maybe me rushing to make a scratch app 😉). Secondly, once the code was "working", I found out that while async was cutting down the time to get through this, it was not completing the entire list of tasks (as they could be in the thousands) before continuing onto the next stage of the process. This resulted in a Task.Run(() => ReplaceFileNamesAsync()).Wait()
call, which actually took longer than the nested loop method. Unpacking and merging the results back into the _fileClasses
property also required some logic, which contributed to the problem.
The Parallel.ForEach
was a much quicker process, and while I didn't see the updated code posted below, I ended up with largely the same result (other than the dictionary).
Upvotes: 0
Views: 6316
Reputation: 81493
To solve the initial problem, is you should be using await Task.WhenAll
not Task.WaitAll
Creates a task that will complete when all of the supplied tasks have completed.
However, this looks like more of a job for Parallel.ForEach
Another issue is you looping over the same list twice (nested) which is a quadratic time complexity and is definitely not thread safe
As a solve, you could create a dictionary of the changes, loop over the change set once (in parallel), and update the references in one go.
_fileClasses = GetFileClasses();
// create a dictionary for fast lookup
var changes = _fileClasses.Where(x => x.Filename != null && x.NewFilename != null)
.ToDictionary(x => x.Filename, x => x.NewFilename);
// parallel the workloads
Parallel.ForEach(_fileClasses, (item) =>
{
// iterate through the references
for (var i = 0; i < item.FileReferences.Length; i++)
{
// check for updates
if (changes.TryGetValue(item.FileReferences[i], out var value))
item.FileReferences[i] = value;
}
});
Note : This wasn't intended to be a complete solution as all the code wasn't supplied, however the time complexity should be a lot better
Upvotes: 4
Reputation: 67
Try use Task.WhenAll
. It will allow you to await it because it returns a task.
Task.WaitAll
is a blocking call and will wait till all the tasks are finished before returning void.
Upvotes: 1