Reputation: 43
I want to list all files in a directory and subdirectories within that directory that match a file mask.
For example "M:\SOURCE\*.doc" while SOURCE may look like this:
|-- SOURCE
| |-- Folder1
| | |-- File1.doc
| | |-- File1.txt
| |-- File2.doc
| |-- File3.xml
Should return File1.doc and File2.doc.
Initially, I use a DirectoryStream, because that already makes some checks for the mask/glob syntax as well as being able to use it for filtering as this ISN'T just some regex but an actual file mask that a regular user finds easier to understand
Files.newDirectoryStream(path, mask);
The problem is a DirectoryStream only checks the immediate path directory that you provide and not it's subdirectories
THEN comes a "flattening" method with Files.walk which is in fact able to look through all of the subdirectories, problem is, it DOES NOT provide with the possibility to "filter" by a File Mask the same way that a DirectoryStream does
Files.walk(path, Integer.MAX_VALUE);
So I'm stuck, unable to combine the best of both methods here...
Upvotes: 3
Views: 3113
Reputation: 491
You can use filtering at the file name string level
private static List<Path> getListFiles(Path path, String regex) throws Exception {
return Files.walk(path).filter(p -> p.getFileName().toString().matches(regex)).toList();
}
Upvotes: 0
Reputation: 19545
It is possible to use common Stream filter
to retrieve the filtered file names from Files.walk
using String::matches
with appropriate regular expression:
final String SOURCE_DIR = "test";
Files.walk(Paths.get(SOURCE_DIR));
.filter(p -> p.getFileName().toString().matches(".*\\.docx?"))
.forEach(System.out::println);
Output
test\level01\level11\test.doc
test\level02\test-level2.doc
test\t1.doc
test\t3.docx
Input directory structure:
│ t1.doc
│ t2.txt
│ t3.docx
│ t4.bin
│
├───level01
│ │ test.do
│ │
│ └───level11
│ test.doc
│
└───level02
test-level2.doc
Update
A recursive solution is possible using newDirectoryStream
however it needs to be converted into Stream:
static Stream<Path> readFilesByMaskRecursively(Path start, String mask) {
List<Stream<Path>> sub = new ArrayList<>();
try {
sub.add(StreamSupport.stream( // read files by mask in current dir
Files.newDirectoryStream(start, mask).spliterator(), false));
Files.newDirectoryStream(start, (path) -> path.toFile().isDirectory())
.forEach(path -> sub.add(recursive(path, mask)));
} catch (IOException ioex) {
ioex.printStackTrace();
}
return sub.stream().flatMap(s -> s); // convert to Stream<Path>
}
// test
readFilesByMaskRecursively(Paths.get(SOURCE_DIR), "*.doc*")
.forEach(System.out::println);
Output:
test\t1.doc
test\t3.docx
test\level01\level11\test.doc
test\level02\test-level2.doc
Update 2
A prefix **/
may be added to the PathMatcher
to cross directory boundaries, then Files.walk
-based solution may use simplified filter without the need to remove specific entries:
String mask = "*.doc*";
PathMatcher maskMatcher = FileSystems.getDefault().getPathMatcher("glob:**/" + mask);
Files.walk(Paths.get(SOURCE_DIR))
.filter(path -> maskMatcher.matches(path))
.forEach(System.out::println);
Output (same as in the recursive solution):
test\level01\level11\test.doc
test\level02\test-level2.doc
test\t1.doc
test\t3.docx
Upvotes: 0
Reputation: 43
I think I might have solved my own question with the insight received here and other questions mentioning the PathMatcher
object
final PathMatcher maskMatcher = FileSystems.getDefault()
.getPathMatcher("glob:" + mask);
final List<Path> matchedFiles = Files.walk(path)
.collect(Collectors.toList());
final List<Path> filesToRemove = new ArrayList<>(matchedFiles.size());
matchedFiles.forEach(foundPath -> {
if (!maskMatcher.matches(foundPath.getFileName()) || Files.isDirectory(foundPath)) {
filesToRemove.add(foundPath);
}
});
matchedFiles.removeAll(filesToRemove);
So basically .getPathMatcher("glob:" + mask);
is the same thing that the DirectoryStream was doing to filter the files
All I have to do now after that is filtering the list of paths that I get with Files.walk by removing the elements that do not match my PathMatcher and are not of type File
Upvotes: 1
Reputation: 2513
You can use also custom FileVisitor
[1], with combination of PathMatcher
[2], which works perfectly with GLOBs.
Code might look like this:
public static void main(String[] args) throws IOException {
System.out.println(getFiles(Paths.get("/tmp/SOURCE"), "*.doc"));
}
public static List<Path> getFiles(final Path directory, final String glob) throws IOException {
final var docFileVisitor = new GlobFileVisitor(glob);
Files.walkFileTree(directory, docFileVisitor);
return docFileVisitor.getMatchedFiles();
}
public static class GlobFileVisitor extends SimpleFileVisitor<Path> {
private final PathMatcher pathMatcher;
private List<Path> matchedFiles = new ArrayList<>();
public GlobFileVisitor(final String glob) {
this.pathMatcher = FileSystems.getDefault().getPathMatcher("glob:" + glob);
}
@Override
public FileVisitResult visitFile(Path path, BasicFileAttributes basicFileAttributes) throws IOException {
if (pathMatcher.matches(path.getFileName())) {
matchedFiles.add(path);
}
return FileVisitResult.CONTINUE;
}
public List<Path> getMatchedFiles() {
return matchedFiles;
}
}
[1] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/FileVisitor.html
[2] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/PathMatcher.html
Upvotes: 3