Wills
Wills

Reputation: 491

Finding the 3 most recently modified files in a long list of files

I have a file list which I want to sort and extract the top 3 last modified.

Constraint: I cannot use Java 7 due to compatibility issues on downstream apps

My current options

Solution 1

File[] files = directory.listFiles();    
Arrays.sort(files, new Comparator<File>(){
    public int compare(File f1, File f2)
    {
        return Long.valueOf(f1.lastModified()).compareTo(f2.lastModified());
    } });

Solution 2

public static void sortFilesDesc(File[] files) {
  Arrays.sort(files, new Comparator() {
    public int compare(Object o1, Object o2) {
      if ((File)o1).lastModified().compareTo((File)o2).lastModified()) {
        return -1;
      } else if (((File) o1).lastModified() < ((File) o2).lastModified()) {
        return +1;
      } else {
        return 0;
      }
    }
  });
}

Problem

The above two solution takes more time to execute & memory. My file list consists of some 300 tar files with 200MB size each. so it is consuming more time & memory.

Is there is any way to efficiently handle this?

Every compare operation uses a file object which is of high memory is there is any way to release the memory and handle this effectively?

Upvotes: 5

Views: 6689

Answers (4)

Alex Kreutznaer
Alex Kreutznaer

Reputation: 1170

You can do it much faster.

Arrays.sort(...) uses "quick sort", which takes ~ n * ln(n) operations.

This example takes only one iteration trough the whole array, which is ~ n operations.

public static void sortFilesDesc(File[] files) {        
    File firstMostRecent = null;
    File secondMostRecent = null;
    File thirdMostRecent = null;
    for (File file : files) {
        if ((firstMostRecent == null)
                || (firstMostRecent.lastModified() < file.lastModified())) {
            thirdMostRecent = secondMostRecent;
            secondMostRecent = firstMostRecent;             
            firstMostRecent = file;
        } else if ((secondMostRecent == null)
                || (secondMostRecent.lastModified() < file.lastModified())) {
            thirdMostRecent = secondMostRecent;
            secondMostRecent = file;
        } else if ((thirdMostRecent == null)
                || (thirdMostRecent.lastModified() < file.lastModified())) {
            thirdMostRecent = file;
        }
    }
} 

On small numbers of files you won't see much difference, but even for tens of files the difference will be significant, for bigger numbers - dramatic.

The code to check the algorithm (please put in a correct files structure):

package com.hk.basicjava.clasload.tests2;

import java.io.File;
import java.util.Date;


class MyFile extends File {

    private long time = 0; 

    public MyFile(String name, long timeMills) {
        super(name);
        time = timeMills;
    }
    @Override
    public long lastModified() {
        return time;
    }
}

public class Files {

    /**
     * @param args
     */
    public static void main(String[] args) {

        File[] files = new File[5]; 
        files[0] = new MyFile("File1", new Date(2013,1,15, 7,0).getTime());
        files[1] = new MyFile("File2", new Date(2013,1,15, 7,40).getTime());
        files[2] = new MyFile("File3", new Date(2013,1,15, 5,0).getTime());
        files[3] = new MyFile("File4", new Date(2013,1,15, 10,0).getTime());
        files[4] = new MyFile("File5", new Date(2013,1,15, 4,0).getTime());
        sortFilesDesc(files);
    }

    public static void sortFilesDesc(File[] files) {        
        File firstMostRecent = null;
        File secondMostRecent = null;
        File thirdMostRecent = null;
        for (File file : files) {
            if ((firstMostRecent == null)
                    || (firstMostRecent.lastModified() < file.lastModified())) {
                thirdMostRecent = secondMostRecent;
                secondMostRecent = firstMostRecent;             
                firstMostRecent = file;
            } else if ((secondMostRecent == null)
                    || (secondMostRecent.lastModified() < file.lastModified())) {
                thirdMostRecent = secondMostRecent;
                secondMostRecent = file;
            } else if ((thirdMostRecent == null)
                    || (thirdMostRecent.lastModified() < file.lastModified())) {
                thirdMostRecent = file;
            }
        }
        System.out.println("firstMostRecent : " + firstMostRecent.getName());
        System.out.println("secondMostRecent : " + secondMostRecent.getName());
        System.out.println("thirdMostRecent : " + thirdMostRecent.getName());
    } 

}

Upvotes: 5

SpaceTrucker
SpaceTrucker

Reputation: 13556

Your problem is that retrieving the last modified date is a relatively expensive operation, because it involves operating system logic. So if you don't mind to get the latest up-to-date values, you could wrap your files in a comparable class.

public class LastModifiedFile implements Comparable<LastModifiedFile> {

    private final File file;
    private final Date lastModified;

    public LastModifiedFile(File file) {
        this.file = file;
        lastModified = file.lastModified();
    }

    public int compareTo(LastModifiedFile other) {
        return lastModified.compareTo(other.lastModified);
    }
}

Note that a changing last modified date during your sorting will result in undefined behavior for many sorting algorithms. Java 7s Tim Sort implementation will throw an exception, if a last modified date changes and therefore comparisons result in different values.

Upvotes: 0

Evgeniy Dorofeev
Evgeniy Dorofeev

Reputation: 136042

I am for Solution 1, with some improvements

Arrays.sort(files, new Comparator<File>() {
        public int compare(File f1, File f2) {
            long d1 = f1.lastModified();
            long d2 = f2.lastModified();
            return d1 > d2 ? 1 : d1 < d2 ? -1 : 0;
        }
    });

to avoid unnecessary objects creation because of Long.valueOf(long).

File does not hold / read any file data but only the file path, there is no performance / memory issue with it. The only time consuming operation here is reading modification time from file system which cannot be avoided.

Upvotes: 0

Aleksander Blomsk&#248;ld
Aleksander Blomsk&#248;ld

Reputation: 18552

You have to check the lastModified of every file, you cannot change that. What you don't need to do is to sort all the elements just to get the top 3. If you can use Guava, you could use Ordering.greatestOf (which uses a good algorithm):

Ordering<File> ordering = Ordering.from( new Comparator(){
        public int compare(File f1, File f2)
        {
            return Long.valueOf(f1.lastModified()).compareTo(f2.lastModified());
        });

List<File> max3 = ordering.greatestOf(Arrays.asList(directory.listFiles()), 3);

Upvotes: 3

Related Questions