Akkad
Akkad

Reputation: 639

Java List Directory With Sorting Performance Issue

I am building a file explorer where am sorting directories and files by name (case insensitive) and ordering directories before files, am using the following code but it is slow in terms of performance, so is there any other way to accomplish this:

File[] directories = new File(path).listFiles(File::isDirectory);
File[] files = new File(path).listFiles(File::isFile);

Arrays.sort(directories, Comparator.comparing(File::getName, String.CASE_INSENSITIVE_ORDER));
Arrays.sort(files, Comparator.comparing(File::getName, String.CASE_INSENSITIVE_ORDER));

File[] list = new File[directories.length + files.length];

System.arraycopy(directories, 0, list, 0, directories.length);  
System.arraycopy(files, 0, list, directories.length, files.length); 

Upvotes: 0

Views: 487

Answers (3)

Akkad
Akkad

Reputation: 639

I went into another direction as am developing for Linux, I used Java to run Linux file access commands using Java Process and it provided a wonderful performance, here is a sample if it helps:

List<String> output = new ArrayList<String>();
try
{
    String cmd = "find /SomePath -maxdepth 1 -mindepth 1 -type d -printf '%f\n' | sort && find /SomePath -maxdepth 1 -not -type d -printf '%f\n' | sort";
    Process p = new ProcessBuilder(cmd).start();
    BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream()));
    String line = "";
    while ((line = reader.readLine()) != null) output.add(line);
    p.waitFor();
}
catch (IOException ex) {} 
catch (InterruptedException ex) {}

Upvotes: 0

DuncG
DuncG

Reputation: 15116

If you have a huge number of files in a folder or are using non-SSD storage the calls to File.listFiles() and File.isDirectory (or isFile) can make the folder scan quite slow. Doing the scan in one step is possible but the sort by dir / file will still repeat the calls to isDirectory/isFile again.

Instead you should look at implementing with Files.find, which means you can read all the file attributes at the same time so that sort does not re-read file system attributes again. It is neatly handled in one stream. Here is an example which just prints the sorted items and modification times of the current folder:

public static Stream<Map.Entry<Path, BasicFileAttributes>>
find(Path dir, int maxDepth, BiPredicate<Path, BasicFileAttributes> matcher, FileVisitOption... options) throws IOException {

    // Using ConcurrentHashMap is safe to use with parallel()
    ConcurrentHashMap<Path,BasicFileAttributes> attrs = new ConcurrentHashMap<>();

    BiPredicate<Path, BasicFileAttributes> predicate = (p,a) -> (matcher == null || matcher.test(p, a)) && attrs.put(p, a) == null;
    return Files.find(dir, maxDepth, predicate, options).map(p -> Map.entry(p, attrs.remove(p)));
}

public static void main(String[] args) throws IOException {
    Path dir = Path.of(args[0]);
    int depth = 1;

    // Note it is easy to add more sort fields here:
    Comparator<Entry<Path, BasicFileAttributes>> compDirs = Comparator.comparing(entry -> entry.getValue().isRegularFile());
    Comparator<Entry<Path, BasicFileAttributes>> comparing = compDirs.thenComparing(entry -> entry.getKey().getFileName().toString(), String.CASE_INSENSITIVE_ORDER);

    try(var files = find(dir, depth, (p,a) -> true)) {
        files.sorted(comparing)
             .forEach(entry 
             -> System.out.println((entry.getValue().isDirectory() ? "DIR  ":"FILE ")+entry.getKey() +" modified "+entry.getValue().lastModifiedTime()));
    }
}

It can be made into tree scan by editing depth to Integer.MAX_VALUE.

As you aren't happy using .forEach() just save the stream as a list and process as regular for loop:

try(var files = find(dir, depth, (p,a) -> true)) {
    for (var entry : files.sorted(comparing).toList()) {
        BasicFileAttributes attr = entry.getValue();
        Path path = entry.getKey();
        System.out.println((attr.isDirectory() ? "DIR  ":"FILE ")+path +" modified "+attr.lastModifiedTime());
    }
}

Upvotes: 1

Chaosfire
Chaosfire

Reputation: 6985

First of all, you should do benchmarking to figure out where is the exact bottleneck. How do I write a correct micro-benchmark in Java? should be a good start.

Now, an idea, which might help.

  1. Get all files and directories in a single array. Like this you are getting file and folders with a single operation, instead of two. Accessing file and folders on the disk is not exactly fast operation, so you would want to reduce those. Also if you take a look at the implementation of listFiles(FileFilter), it iterates everything in the folder to find matches, so that's 1 less iteration of all elements.
File[] directoriesAndFiles = folder.listFiles();
  1. As mentioned in other answer isFile, isDirectory are slow, calling them more than once is not desirable. You can define your wrapper class (record in this case) to cache the results.
public record FileWrapper(File file, String name, boolean isDirectory, boolean isFile) {
}
  1. Write composite comparator for the sort. Judging by code you want directories first.
public class FileTypeComparator implements Comparator<FileWrapper> {

    @Override
    public int compare(FileWrapper first, FileWrapper second) {
        if (first.isDirectory() && second.isFile()) {
            return -1;
        }
        if (first.isFile() && second.isDirectory()) {
            return 1;
        }
        return 0;
    }
}

Combine it with your by name comparator:

Comparator<FileWrapper> compositeComparator = new FileTypeComparator()
    .thenComparing(FileWrapper::name, String.CASE_INSENSITIVE_ORDER);

Then build the result array with a stream

File[] result = Arrays.stream(directoriesAndFiles)
        .map(file -> new FileWrapper(file, file.getName(), file.isDirectory(), file.isFile())) //map to wrapper class to cache result of expensive methods
        .filter(fw -> fw.isFile() || fw.isDirectory()) //filter out not needed elements
        .sorted(compositeComparator) //sort with composite comparator
        .map(FileWrapper::file) //extract files from the wrapper
        .toArray(File[]::new); //collect in array

Additionally, to avoid building the initial array you can use Files.list() to get lazily populated stream.

try (Stream<Path> stream = Files.list(Path.of("initial directory"))) {
    File[] result = stream
            .map(Path::toFile)
            .map(file -> new FileWrapper(file, file.getName(), file.isDirectory(), file.isFile()))
            .filter(fw -> fw.isFile() || fw.isDirectory())
            .sorted(compositeComparator)
            .map(FileWrapper::file)
            .toArray(File[]::new);
} catch (IOException exc) {
    //handle
}

Upvotes: 1

Related Questions