General Grey
General Grey

Reputation: 3688

Retrieve a list of filenames in folder and all subfolders quickly

I need to get a list of all Word Documents. *.doc and *.docx that are stored in a Windows based folder, with many subfolders, and sub sub folders etc...

Searching for a file with C# has an answer that works, it is 2 years old and takes 10 seconds to search through 1500 files, (in the future there may be 10,000 or more). I will post my code which is basically a copy from the above link. Does anyone have a better solution?

DateTime dt = DateTime.Now;
DirectoryInfo dir = new DirectoryInfo(MainFolder);
List<FileInfo> matches = 
          new List<FileInfo>(dir.GetFiles("*.doc*",SearchOption.AllDirectories));
TimeSpan ts = DateTime.Now-dt;
MessageBox.Show(matches.Count + " matches in " + ts.TotalSeconds + " seconds");

Upvotes: 2

Views: 3432

Answers (4)

Nicolas
Nicolas

Reputation: 6494

In a first time I suggest you to use StopWatch instead of DateTime to get the elapsed time.
In a second time to make your search faster you shouldn't store the result of GetFiles in a List but directly into an array.
And finally, you should optimize your search pattern : you want every doc and docx file, try "*.doc?"
Here is my suggestion :

var sw = new Stopwatch();
sw.Start();

var matches = Directory.GetFiles(MainFolder, "*.doc?", SearchOption.AllDirectories);

sw.Stop();
MessageBox.Show(matches.Length + " matches in " + sw.Elapsed.TotalSeconds + " seconds");

Upvotes: 1

Tony Hopkinson
Tony Hopkinson

Reputation: 20320

Doubt there's much you can do with that,

dir.GetFiles("*.doc|*.docx", SearchOptions.AllDirectories) might have an impact in that it's more restrictive pattern.

Upvotes: 2

Paul
Paul

Reputation: 6228

If you want the full list, other than making sure the Windows Indexing Service is enable on the target folders, not really. Your main delay is going to be reading from the hard drive, and no optimizing of your C# code will make that process any faster. You could create your own simple indexing service, perhaps using a FileSystemWatcher, that would give you sub-second response times no matter how many documents are added.

Upvotes: 1

Reed Copsey
Reed Copsey

Reputation: 564441

You can use Directory.EnumerateFiles instead of GetFiles. This has the advantage of returning the files as an IEnumerable<T>, which allows you to begin your processing of the result set immediately (instead of waiting for the entire list to be returned).

If you're merely counting the number of files or listing all files, it may not help. If, however, you can do your processing and/or filtering of the results, and especially if you can do any of it in other threads, it can be significantly faster.

From the documentation:

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

Upvotes: 5

Related Questions