Jeyson Ardila
Jeyson Ardila

Reputation: 43

List all Files from a Directory that match a File Mask (a.k.a Pattern or Glob)

I want to list all files in a directory and subdirectories within that directory that match a file mask.

For example "M:\SOURCE\*.doc" while SOURCE may look like this:

|-- SOURCE
|   |-- Folder1
|   |   |-- File1.doc
|   |   |-- File1.txt
|   |-- File2.doc
|   |-- File3.xml

Should return File1.doc and File2.doc.

Initially, I use a DirectoryStream, because that already makes some checks for the mask/glob syntax as well as being able to use it for filtering as this ISN'T just some regex but an actual file mask that a regular user finds easier to understand

Files.newDirectoryStream(path, mask);

The problem is a DirectoryStream only checks the immediate path directory that you provide and not it's subdirectories

THEN comes a "flattening" method with Files.walk which is in fact able to look through all of the subdirectories, problem is, it DOES NOT provide with the possibility to "filter" by a File Mask the same way that a DirectoryStream does

Files.walk(path, Integer.MAX_VALUE);

So I'm stuck, unable to combine the best of both methods here...

Upvotes: 3

Views: 3113

Answers (4)

KUL
KUL

Reputation: 491

You can use filtering at the file name string level

private static List<Path> getListFiles(Path path, String regex) throws Exception {
    return Files.walk(path).filter(p -> p.getFileName().toString().matches(regex)).toList();
}

Upvotes: 0

Nowhere Man
Nowhere Man

Reputation: 19545

It is possible to use common Stream filter to retrieve the filtered file names from Files.walk using String::matches with appropriate regular expression:

final String SOURCE_DIR = "test";

Files.walk(Paths.get(SOURCE_DIR));
     .filter(p -> p.getFileName().toString().matches(".*\\.docx?"))
     .forEach(System.out::println);

Output

test\level01\level11\test.doc
test\level02\test-level2.doc
test\t1.doc
test\t3.docx

Input directory structure:

│   t1.doc
│   t2.txt
│   t3.docx
│   t4.bin
│
├───level01
│   │   test.do
│   │
│   └───level11
│           test.doc
│
└───level02
        test-level2.doc

Update

A recursive solution is possible using newDirectoryStream however it needs to be converted into Stream:

static Stream<Path> readFilesByMaskRecursively(Path start, String mask) {
        
    List<Stream<Path>> sub = new ArrayList<>();
        
    try {
        sub.add(StreamSupport.stream( // read files by mask in current dir
                Files.newDirectoryStream(start, mask).spliterator(), false));
            
        Files.newDirectoryStream(start, (path) -> path.toFile().isDirectory())
             .forEach(path -> sub.add(recursive(path, mask)));
    } catch (IOException ioex) {
        ioex.printStackTrace();
    }
        
    return sub.stream().flatMap(s -> s); // convert to Stream<Path>
}

// test
readFilesByMaskRecursively(Paths.get(SOURCE_DIR), "*.doc*")
             .forEach(System.out::println);

Output:

test\t1.doc
test\t3.docx
test\level01\level11\test.doc
test\level02\test-level2.doc

Update 2

A prefix **/ may be added to the PathMatcher to cross directory boundaries, then Files.walk-based solution may use simplified filter without the need to remove specific entries:

String mask = "*.doc*";
PathMatcher maskMatcher = FileSystems.getDefault().getPathMatcher("glob:**/" + mask);
Files.walk(Paths.get(SOURCE_DIR))
     .filter(path -> maskMatcher.matches(path))
     .forEach(System.out::println);

Output (same as in the recursive solution):

test\level01\level11\test.doc
test\level02\test-level2.doc
test\t1.doc
test\t3.docx

Upvotes: 0

Jeyson Ardila
Jeyson Ardila

Reputation: 43

I think I might have solved my own question with the insight received here and other questions mentioning the PathMatcher object

final PathMatcher maskMatcher = FileSystems.getDefault()
                  .getPathMatcher("glob:" + mask);

final List<Path> matchedFiles = Files.walk(path)
                  .collect(Collectors.toList());

final List<Path> filesToRemove = new ArrayList<>(matchedFiles.size());

matchedFiles.forEach(foundPath -> {
            if (!maskMatcher.matches(foundPath.getFileName()) || Files.isDirectory(foundPath)) {
              filesToRemove.add(foundPath);
            }
          });

 matchedFiles.removeAll(filesToRemove);

So basically .getPathMatcher("glob:" + mask); is the same thing that the DirectoryStream was doing to filter the files

All I have to do now after that is filtering the list of paths that I get with Files.walk by removing the elements that do not match my PathMatcher and are not of type File

Upvotes: 1

hradecek
hradecek

Reputation: 2513

You can use also custom FileVisitor [1], with combination of PathMatcher [2], which works perfectly with GLOBs.

Code might look like this:

public static void main(String[] args) throws IOException {
    System.out.println(getFiles(Paths.get("/tmp/SOURCE"), "*.doc"));
}

public static List<Path> getFiles(final Path directory, final String glob) throws IOException {
    final var docFileVisitor = new GlobFileVisitor(glob);
    Files.walkFileTree(directory, docFileVisitor);

    return docFileVisitor.getMatchedFiles();
}

public static class GlobFileVisitor extends SimpleFileVisitor<Path> {

    private final PathMatcher pathMatcher;
    private List<Path> matchedFiles = new ArrayList<>();

    public GlobFileVisitor(final String glob) {
        this.pathMatcher = FileSystems.getDefault().getPathMatcher("glob:" + glob);
    }

    @Override
    public FileVisitResult visitFile(Path path, BasicFileAttributes basicFileAttributes) throws IOException {
        if (pathMatcher.matches(path.getFileName())) {
            matchedFiles.add(path);
        }
        return FileVisitResult.CONTINUE;
    }

    public List<Path> getMatchedFiles() {
        return matchedFiles;
    }
}

[1] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/FileVisitor.html

[2] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/PathMatcher.html

Upvotes: 3

Related Questions