George2
George2

Reputation: 45801

file enumerate issue

I am using VSTS 2008 + C# + .Net 3.5 to develop a console application. I need to enumerate the most recent 50 files in current folder (to read file content, to get file meta data like file name, creation time, etc.). Since the current folder has about 5,000 files, and if I use Directory.GetFiles API, all 5,000 files' meta data information will be read into memory. I think it is a waste since I only need to access the most recent 50 files.

Any solutions to access only the 50 most recent files in current directory?

Upvotes: 0

Views: 441

Answers (2)

Fredrik Mörk
Fredrik Mörk

Reputation: 158349

This solution still loads metadata about all files, but I would say it's fast enough for most uses. The following code reports that it takes around 50ms to enumerate the 50 most recently updated files in my Windows\System32 directory (~2500 files). Unless the code is run very frequently I would probably not spend time optimizing it a lot more:

FileInfo[] files = (new DirectoryInfo(@"C:\WINDOWS\System32")).GetFiles();
Stopwatch sw = new Stopwatch();
sw.Start();
IEnumerable<FileInfo> recentFiles = files.OrderByDescending(
                                              fi => fi.LastWriteTime).Take(50);
List<FileInfo> list = recentFiles.ToList();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
list.ForEach(fi => Console.WriteLine(fi.Name));

Update

Based on the discussion in the comments regarding using date/time in the file name: note that Directory.GetFiles does not load metadata about files; it simply returns a string array with file names (DirectoryInfo.GetFiles on the other hand returns an array of FileInfo objects). So, if you have date and time in your file names (preferably in a format that lends itself to sorting, such as yyyyMMdd-HHmmss or something like that) you can use Directory.GetFiles to get the file names, sort descending and then pick the 50 first from the list:

string[] files = Directory.GetFiles(pathToLogFiles);
IEnumerable<string> recentFiles = files.OrderByDescending(s => s).Take(50);
List<string> recentFiles = recentFiles.ToList();

Then loop over the list and load whatever data you need from each file.

Upvotes: 3

csharptest.net
csharptest.net

Reputation: 64248

I'm really not sure it will be worth your while... consider the following program:

    class DateCompare : IComparer<FileInfo>
    {
        public int Compare(FileInfo a, FileInfo b)
        { 
            int result = a.LastWriteTime.CompareTo(b.LastWriteTime);
            if (result == 0)
                return StringComparer.OrdinalIgnoreCase.Compare(a.FullName, b.FullName);
            return result;
        }
    }

    public static void Main(string[] args)
    {
        DirectoryInfo root = new DirectoryInfo("c:\\Projects\\");
        DateTime start = DateTime.Now;
        long memory = GC.GetTotalMemory(false);
        FileInfo[] allfiles = root.GetFiles("*", SearchOption.AllDirectories);
        DateTime sortStart = DateTime.Now;
        List<FileInfo> files = new List<FileInfo>(20000);
        IComparer<FileInfo> cmp = new DateCompare();
        foreach (FileInfo file in allfiles)
        {
            int pos = ~files.BinarySearch(file, cmp);
            files.Insert(pos, file);
        }
        Console.WriteLine("Count = {0:#,###}, Read = {1}, Sort = {2}, Memory = {3:#,###}", files.Count, sortStart - start, DateTime.Now - sortStart, GC.GetTotalMemory(false) - memory);
    }

This is the output of the above program:

Count = 16,357, Read = 00:00:03.5793579, Sort = 00:00:06.7776777, Memory = 5,758,976
Count = 16,357, Read = 00:00:03.2173217, Sort = 00:00:06.1616161, Memory = 7,339,920
Count = 16,357, Read = 00:00:03.5083508, Sort = 00:00:06.7556755, Memory = 10,346,504

That running in 3 seconds allocating between 5~10mb while crawling 6,931 directories and returning 16k file names. That is three times the volume your talking about and I bet most of the time is crawling the directory tree (I don't have a directory with 5k worth of files). The worst expense is always going to be the sort, if you can throw out files by matching file names I would recommend that.

Upvotes: 1

Related Questions