Pratik
Pratik

Reputation: 11745

selecting previous day files from huge list of files

I'm selecting previous day files from huge list of files

// selecting around 80-120 files from 20,000 - 25,000 

FileInfo[] files = (new DirectoryInfo(dirPath)).GetFiles("*.xml");
 string[] selectedFiles = (from c in files
                                          where c.CreationTime >= DateTime.Today.AddDays(-1) && c.CreationTime < DateTime.Today.AddHours(-2.0)
                                          select c.FullName).ToArray();

The above takes around 4-5 mins to run, Can you please tell me how to optimize it, without changing functionality!

// file selection is between yesterday 0:00 to yesterday 22:00 <br >

as shown in code above.
Kindly advice.

Upvotes: 0

Views: 98

Answers (2)

Alex
Alex

Reputation: 8116

Don't instantiate a new FileInfo class for each file if you only need to know the CreationTime. Also, you don't have to use DirectoryInfo.

I'd use something like this:

DateTime lowDate = DateTime.Today.AddDays(-1);
DateTime highDate = DateTime.Today.AddHours(-2.0);

var filteredFileNames = new List<String>();
string[] fileNames;
fileNames = Directory.GetFiles(dirPath, "*.xml")

for (int i = 0; i < fileNames.Length; i++)
{
   var creationTime = File.GetCreationTimeUtc(fileNames[i]);
   if(creationTime >= lowDate && creationTime < highDate)
   {
    filteredFileNames.Add(filenNames[i]);
   }
}

In case you're not I/O bound you can still divide up parts of the time frame into different Tasks / Threads (Depending on what .NET version you're on) and cumulate the names in the end. However, the most work done is with Directory.GetFiles. Especially if its a large directory.

When I had to handle large amounts of files in one directory, I went on using FindFirstFile/ FindNextFile and FindClose of the Win 32 API. It provides much less overhead and is a faster.

FindFirstFile Implementation

Upvotes: 0

Dan Pichelman
Dan Pichelman

Reputation: 2332

Something to try:

FileInfo[] files = (new DirectoryInfo(dirPath)).GetFiles("*.xml");

DateTime lowDate = DateTime.Today.AddDays(-1);
DateTime highDate = DateTime.Today.AddHours(-2.0);

 string[] selectedFiles = (from c in files
                                          where c.CreationTime >= lowDate && c.CreationTime < highDate
                                          select c.FullName).ToArray();

It's possible that those dates were being calculated 20,000+ times, each.

Upvotes: 1

Related Questions