Reputation: 11745
I'm selecting previous day files from huge list of files
// selecting around 80-120 files from 20,000 - 25,000
FileInfo[] files = (new DirectoryInfo(dirPath)).GetFiles("*.xml");
string[] selectedFiles = (from c in files
where c.CreationTime >= DateTime.Today.AddDays(-1) && c.CreationTime < DateTime.Today.AddHours(-2.0)
select c.FullName).ToArray();
The above takes around 4-5 mins to run, Can you please tell me how to optimize it, without changing functionality!
// file selection is between yesterday 0:00 to yesterday 22:00 <br >
as shown in code above.
Kindly advice.
Upvotes: 0
Views: 98
Reputation: 8116
Don't instantiate a new FileInfo
class for each file if you only need to know the CreationTime. Also, you don't have to use DirectoryInfo
.
I'd use something like this:
DateTime lowDate = DateTime.Today.AddDays(-1);
DateTime highDate = DateTime.Today.AddHours(-2.0);
var filteredFileNames = new List<String>();
string[] fileNames;
fileNames = Directory.GetFiles(dirPath, "*.xml")
for (int i = 0; i < fileNames.Length; i++)
{
var creationTime = File.GetCreationTimeUtc(fileNames[i]);
if(creationTime >= lowDate && creationTime < highDate)
{
filteredFileNames.Add(filenNames[i]);
}
}
In case you're not I/O bound you can still divide up parts of the time frame into different Tasks
/ Threads
(Depending on what .NET version you're on) and cumulate the names in the end. However, the most work done is with Directory.GetFiles
. Especially if its a large directory.
When I had to handle large amounts of files in one directory, I went on using FindFirstFile
/ FindNextFile
and FindClose
of the Win 32 API. It provides much less overhead and is a faster.
Upvotes: 0
Reputation: 2332
Something to try:
FileInfo[] files = (new DirectoryInfo(dirPath)).GetFiles("*.xml");
DateTime lowDate = DateTime.Today.AddDays(-1);
DateTime highDate = DateTime.Today.AddHours(-2.0);
string[] selectedFiles = (from c in files
where c.CreationTime >= lowDate && c.CreationTime < highDate
select c.FullName).ToArray();
It's possible that those dates were being calculated 20,000+ times, each.
Upvotes: 1