Reputation: 287
I am iterating a large directory(500 GB) over a Network, i need to iterate all the files from root and inside sub directories,here is my sample code
static void WalkDirectoryTree(DirectoryInfo root, DbContext dbcontext)
{
FileInfo[] files = null;
DirectoryInfo[] subDirs = null;
try
{
Console.WriteLine(DateTime.Now + " Listing files...");
files = root.GetFiles("*.*");
Console.WriteLine(DateTime.Now + " Files obtained.");
}
catch (UnauthorizedAccessException e)
{
}
catch (System.IO.DirectoryNotFoundException e)
{
Debug.Print(e.Message);
}
if (files != null)
{
Console.WriteLine(DateTime.Now + " Iterating files...");
foreach (System.IO.FileInfo fi in files)
{
Console.WriteLine(DateTime.Now + " Indexing [" + fi.FullName + "]...");
doIndex(IndexData index = new IndexData();
index.attachementUID = fi.Name;
dbcontext.IndexDatas.Add(index);
}
Console.WriteLine(DateTime.Now + " File iteration completed.");
subDirs = root.GetDirectories();
foreach (System.IO.DirectoryInfo dirInfo in subDirs)
{
WalkDirectoryTree(dirInfo,dbcontext);
}
}
}
Performance is very slow, i am reading file name and file path, So can you guys recommend something, which i can use to iterate all the files from a Network path, how to improve current code, any System.IO improvements and alternatives.
Secondly how can i keep track of my position in file System. i iterate half way and application crashed, how can i start iterating again from same position?
Upvotes: 0
Views: 289
Reputation: 62101
There is no solution at the end - you can make the problem smaller, but not go away. Getting DirectoryInfo objects of a large number of files (file size is irrelevant) is a slow operation even locally, over the network it just takes time.
A 10g network can help a little, as do faster discs, but this just is not an operation that is optimized for high throughput.
Upvotes: 1
Reputation: 29668
You should use DirectoryInfo.EnumerateFiles()
rather than DirectoryInfo.GetFiles()
, from MSDN:
The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
Upvotes: 5