Reputation: 486
I would like to process a filesystem/folder for subdirectories and files in C#. I'm using Tasks from the TPL library. The idea is to do it recursively and create for every folder a task. The main thread should wait for the child threads to finish and then print some info. In fact I just want to know when scanning is finished. I have started with threadpool, then switched to TLP. Did some easy examples. after some tries from simple code to more and more bloated code I'm stuck here:
private Logger log = LogManager.GetCurrentClassLogger();
public MediaObjectFolder MediaObjectFolder { get; set; }
private Queue<MediaObjectFolder> Queue { get; set; }
private object quelock, tasklock;
private List<Task> scanTasks;
public IsoTagger()
{
quelock = new object();
tasklock = new object();
scanTasks = new List<Task>();
MediaObjectFolder = new MediaObjectFolder(@"D:\Users\Roman\Music\Rock\temp");
Queue = new Queue<MediaObjectFolder>();
}
public MediaObject RescanFile(string fullpath, string filename)
{
return new MediaObject(fullpath);
}
public void Rescan()
{
Queue.Clear();
lock (tasklock)
{
Task scanFolderTask = Task.Factory.StartNew(ScanFolder, MediaObjectFolder);
scanTasks.Add(scanFolderTask);
}
Task.Factory.ContinueWhenAll(scanTasks.ToArray(), (ant) =>
{
if (log != null)
{
log.Debug("scan finished");
log.Debug("number of folders: {0}", Queue.Count);
}
});
}
private void ScanFolder(object o)
{
List<Task> subTasks = new List<Task>();
MediaObjectFolder mof = o as MediaObjectFolder;
log.Debug("thread - " + mof.Folder);
string[] subdirs = Directory.GetDirectories(mof.Folder);
string[] files = Directory.GetFiles(mof.Folder, "*.mp3");
foreach(string dir in subdirs)
{
log.Debug(dir);
MediaObjectFolder tmp = new MediaObjectFolder(dir);
lock (tasklock)
{
Task tmpTask = new Task(ScanFolder, tmp);
subTasks.Add(tmpTask);
}
}
foreach (Task tsk in subTasks)
{
tsk.Start();
}
foreach (string file in files)
{
log.Debug(file);
MediaObject tmp = new MediaObject(file);
MediaObjectFolder.MediaObjects.Add(tmp);
}
lock (quelock)
{
Queue.Enqueue(mof);
}
if (subTasks != null)
Task.Factory.ContinueWhenAll(subTasks.ToArray(), logTask => log.Debug("thread release - " + mof.Folder));
}
Main thread still sometimes continues too early and not after finishing of all other threads. (I'm relatively new to C# and not an expert in parallel programming too, so there might be some heavy-weight concept errors)
Upvotes: 0
Views: 1124
Reputation: 486
after good suggestions by Servy and further research about Parallelism in C# i came up with an answer to my question. As i don't really need LINQ for this simple task, where i just want to enumerate my filesystem and process the folders parallel.
public void Scan()
{
// ...
// enumerate all directories under one root folder (mof.Folder)
var directories = Directory.EnumerateDirectories(mof.Folder, "*", SearchOption.AllDirectories);
// use parallel foreach from TPL to process folders
Parallel.ForEach(directories, ProcessFolder);
// ...
}
private void ProcessFolder(string folder)
{
if (!Directory.Exists(folder))
{
throw new ArgumentException("root folder does not exist!");
}
MediaObjectFolder mof = new MediaObjectFolder(folder);
IEnumerable<string> files = Directory.EnumerateFiles(folder, "*.mp3", SearchOption.TopDirectoryOnly);
foreach (string file in files)
{
MediaObject mo = new MediaObject(file);
mof.MediaObjects.Add(mo);
}
lock (quelock)
{
// add object to global queue
Enqueue(mof);
}
}
after a quite an intensive research i found this as the easiest solution. please note: i haven't done any tests if this approach is faster, as i work on a temp file base which is not really big. this is also the way described in the MSDN library for parallel processing of the filesystem.
PS: there is also a lot of space for improvement of performance
Upvotes: 0
Reputation: 203825
The general approach that you're taking inherently makes this a fairly hard problem to solve. Instead, you can simply use the file system methods to traverse the hierarchy for you, and then use PLINQ to process those files in parallel effectively:
var directories = Directory.EnumerateDirectories(path, "*"
, SearchOption.AllDirectories);
var query = directories.AsParallel().Select(dir =>
{
var files = Directory.EnumerateFiles(dir, "*.mp3"
, SearchOption.TopDirectoryOnly);
//TODO create custom object and add files
});
Upvotes: 4
Reputation: 2578
You'll want to research the Task.WaitAll and Task.WaitAny methods. There is example code here: msdn.microsoft.com
For the quick answer:
Task.WaitAll(subTasks);
should work for you.
Upvotes: 0