Michael Kay
Michael Kay

Reputation: 163458

"Too many open files in system" failure while listing a recursive directory structure

I've implemented (in Java) a fairly straightforward Iterator to return the names of the files in a recursive directory structure, and after about 2300 files it failed "Too many open files in system" (the failure was actually in trying to load a class, but I assume the directory listing was the culprit).

The data structure maintained by the iterator is a Stack holding the contents of the directories that are open at each level.

The actual logic is fairly basic:

private static class DirectoryIterator implements Iterator<String> {

        private Stack<File[]> directories;
        private FilenameFilter filter;
        private Stack<Integer> positions = new Stack<Integer>();
        private boolean recurse;
        private String next = null;

        public DirectoryIterator(Stack<File[]> directories, boolean recurse, FilenameFilter filter) {
            this.directories = directories;
            this.recurse = recurse;
            this.filter = filter;
            positions.push(0);
            advance();
        }

        public boolean hasNext() {
            return next != null;
        }

        public String next() {
            String s = next;
            advance();
            return s;
        }

        public void remove() {
            throw new UnsupportedOperationException();
        }

        private void advance() {
            if (directories.isEmpty()) {
                next = null;
            } else {
                File[] files = directories.peek();
                while (positions.peek() >= files.length) {
                    directories.pop();
                    positions.pop();
                    if (directories.isEmpty()) {
                        next = null;
                        return;
                    }
                    files = directories.peek();
                }
                File nextFile = files[positions.peek()];
                if (nextFile.isDirectory()) {
                    int p = positions.pop() + 1;
                    positions.push(p);
                    if (recurse) {
                        directories.push(nextFile.listFiles(filter));
                        positions.push(0);
                        advance();
                    } else {
                        advance();
                    }
                } else {
                    next = nextFile.toURI().toString();
                    count++;
                    if (count % 100 == 0) {
                        System.err.println(count + "  " + next);
                    }
                    int p = positions.pop() + 1;
                    positions.push(p);
                }
            }
        }
    }

I would like to understand how many "open files" this requires. Under what circumstances is this algorithm "opening" a file, and when does it get closed again?

I've seen some neat code using Java 7 or Java 8, but I'm constrained to Java 6.

Upvotes: 7

Views: 1365

Answers (2)

Michael Kay
Michael Kay

Reputation: 163458

Thanks everyone for the help and advice. I established that the problem is actually in what is being done with the files after they are returned by the iterator: the "client" code is opening the files as they are delivered, and is not tidying up properly. It's complicated by the fact that the files coming back are actually being processed in parallel.

I've also rewritten the DireectoryIterator, which I share incase anyone is interested:

private static class DirectoryIterator implements Iterator<String> {

        private Stack<Iterator<File>> directories;
        private FilenameFilter filter;
        private boolean recurse;
        private String next = null;

        public DirectoryIterator(Stack<Iterator<File>> directories, boolean recurse, FilenameFilter filter) {
            this.directories = directories;
            this.recurse = recurse;
            this.filter = filter;
            advance();
        }

        public boolean hasNext() {
            return next != null;
        }

        public String next() {
            String s = next;
            advance();
            return s;
        }

        public void remove() {
            throw new UnsupportedOperationException();
        }

        private void advance() {
            if (directories.isEmpty()) {
                next = null;
            } else {
                Iterator<File> files = directories.peek();
                while (!files.hasNext()) {
                    directories.pop();
                    if (directories.isEmpty()) {
                        next = null;
                        return;
                    }
                    files = directories.peek();
                }
                File nextFile = files.next();
                if (nextFile.isDirectory()) {
                    if (recurse) {
                        directories.push(Arrays.asList(nextFile.listFiles(filter)).iterator());
                    }
                    advance();
                } else {
                    next = nextFile.toURI().toString();
                }
            }
        }
    }

Upvotes: 1

Michael Paddon
Michael Paddon

Reputation: 343

When you call nextFile.listFiles(), an underlying file descriptor is opened to read the directory. There is no way to explicitly close this descriptor, so you are relying on garbage collection. As your code descends a deep tree, it is essentially collecting a stack of nextFile instances which can't be garbaged collected.

Step 1: set nextFile = null before calling advance(). This releases the object for garbage collection.

Step 2: you may need to call System.gc() after nulling nextFile to encourage quick garbage collection. Unfortunately, there is no way to force GC.

Step 3: you may need to increase the open file limit on your operating system. On Linux this may be done with ulimit(1).

If you can migrate to Java 7 or later, then DirectoryStream will solve your problem. Instead of using nextFile.listFiles(), use Files.newDirectoryStream(nextFile.toPath()) to get a DirectoryStream. You can then iterate over the stream and then close() it to release the operating system resources. Each returned path can be converted back to a file with toFile(). However you might like to refactor to use just Path instead of File.

Upvotes: 6

Related Questions