Reputation: 3161
I have hundreds of thousands of small text files between 0 and 8kb each on a LAN network share. I can use some interop calls with kernel32.dll
and FindFileEx
to recursively pull a list of the fully qualified UNC path of each file and store the paths in memory in a collection class such as List<string>
. Using this approach I was able to populate the List<string>
fairly quickly (about 30seconds per 50k file names as compared to 3minutes of Directory.GetFiles
).
Though, once I've crawled the directories and stored the file paths in the List<string>
I now want to make a pass on every path stored in my list and read the contents of the small text file and perform some action based on the values read in.
As a test bed I iterated over each file path in a List<string>
that stored 42,945 file paths to this LAN network share and performed the following lines on each FileFullPath
:
StreamReader file = new StreamReader(FileFullPath);
file.ReadToEnd();
file.Close();
So with just these lines, it takes 13-15minutes runtime for all 42,945 files paths stored in my list.
Is there a more optimal way to load in many small text files via C#? Is there some interop I should consider? Or is this pretty much the best I can expect? It just seems like an awfully long time.
Upvotes: 3
Views: 205
Reputation: 564323
I would consider using Directory.EnumerateFiles
, and then processing your files as you read them.
This would prevent the need to actually store the list of 42,945 files at once, as well as open up the potential of doing some of the processing in parallel using PLINQ (depending on the processing requirements of the files).
If the processing has a reasonably large CPU portion of the total time (and it's not purely I/O bound), this could potentially provide a large benefit in terms of complete time required.
Upvotes: 3