Jeremy Mc
Jeremy Mc

Reputation: 119

Listing a very large number of files in a directory in C#

I'm trying to get a list of files in a specific directory that contains over 20 million files ranging from 2 to 20 KB each.
The problem is that my program throws the Out Of Memory Exception everytime, while tools like robocopy are doing a good job copying the folder to another directory with no problem at all. Here's the code I'm using to enumerate files:

            List<string> files = new List<string>(Directory.EnumerateFiles(searchDir));

What should I do to solve this problem? Any help would be appreciated.

Upvotes: 2

Views: 4639

Answers (2)

Karatheodory
Karatheodory

Reputation: 945

The answer above covers one directory level. To be able to enumerate through multiple levels of directories, each having a large number of directories with a large number of files, one can do the following:

public IEnumerable<string> EnumerateFiles(string startingDirectoryPath) {
    var directoryEnumerables = new Queue<IEnumerable<string>>();
    directoryEnumerables.Enqueue(new string[] { startingDirectoryPath });
    while (directoryEnumerables.Any()) {
        var currentDirectoryEnumerable = directoryEnumerables.Dequeue();
        foreach (var directory in currentDirectoryEnumerable) {
            foreach (var filePath in EnumerateFiles(directory)) {
                yield return filePath;
            }
            directoryEnumerables.Enqueue(Directory.EnumerateDirectories(directory));
        }                
    }
}

The function will traverse a collection of directories through enumerators, so it will load the directory contents one by one. The only thing left to solve is the depth of the hierarchy...

Upvotes: 1

Habib
Habib

Reputation: 223392

You are creating a list of 20 million object in memory. I don't think you will ever use that, even if it become possible.

Instead use to Directory.EnumerateFiles(searchDir) and iterate each item one by one.

like:

foreach(var file in Directory.EnumerateFiles(searchDir))
{
   //Copy to other location, or other stuff
}

With your current code, your program will have 20 million objects first loaded up in memory and then you have to iterate, or perform operations on them.

See: Directory.EnumerateFiles Method (String)

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

Upvotes: 9

Related Questions