Reputation: 44193
Is there a good "scala-esque" (I guess I mean functional) way of recursively listing files in a directory? What about matching a particular pattern?
For example recursively all files matching "a*.foo"
in c:\temp
.
Upvotes: 102
Views: 74614
Reputation: 37
获取路径下所有文件,剔除文件夹
import java.io.File
import scala.collection.mutable.{ArrayBuffer, ListBuffer}
object pojo2pojo {
def main(args: Array[String]): Unit = {
val file = new File("D:\\tmp\\tmp")
val files = recursiveListFiles(file)
println(files.toList)
// List(D:\tmp\tmp\1.txt, D:\tmp\tmp\a\2.txt)
}
def recursiveListFiles(f: File):ArrayBuffer[File] = {
val all = collection.mutable.ArrayBuffer(f.listFiles:_*)
val files = all.filter(_.isFile)
val dirs = all.filter(_.isDirectory)
files ++ dirs.flatMap(recursiveListFiles)
}
}
Upvotes: 0
Reputation: 13
Minor improvement to the accepted answer.
By partitioning on the _.isDirectory
this function returns list of files only.
(Directories are excluded)
import java.io.File
def recursiveListFiles(f: File): Array[File] = {
val (dir, files) = f.listFiles.partition(_.isDirectory)
files ++ dir.flatMap(recursiveListFiles)
}
Upvotes: 0
Reputation: 31
The deepFiles method of scala.reflect.io.Directory provides a pretty nice way of recursively getting all the files in a directory:
import scala.reflect.io.Directory
new Directory(f).deepFiles.filter(x => x.startsWith("a") && x.endsWith(".foo"))
deepFiles returns an iterator so you can convert it some other collection type if you don't need/want lazy evaluation.
Upvotes: 1
Reputation: 19338
os-lib is the easiest way to recursively list files in Scala.
os.walk(os.pwd/"countries").filter(os.isFile(_))
Here's how to recursively list all the files that match the "a*.foo"
pattern specified in the question:
os.walk(os.pwd/"countries").filter(_.segments.toList.last matches "a.*\\.foo")
os-lib is way more elegant and powerful than other alternatives. It returns os
objects that you can easily move, rename, whatever. You don't need to suffer with the clunky Java libraries anymore.
Here's a code snippet you can run if you'd like to experiment with this library on your local machine:
os.makeDir(os.pwd/"countries")
os.makeDir(os.pwd/"countries"/"colombia")
os.write(os.pwd/"countries"/"colombia"/"medellin.txt", "q mas pues")
os.write(os.pwd/"countries"/"colombia"/"a_something.foo", "soy un rolo")
os.makeDir(os.pwd/"countries"/"brasil")
os.write(os.pwd/"countries"/"brasil"/"a_whatever.foo", "carnaval")
os.write(os.pwd/"countries"/"brasil"/"a_city.txt", "carnaval")
println(os.walk(os.pwd/"countries").filter(os.isFile(_)))
will return this:
ArraySeq(
/.../countries/brasil/a_whatever.foo,
/.../countries/brasil/a_city.txt,
/.../countries/colombia/a_something.foo,
/.../countries/colombia/medellin.txt)
os.walk(os.pwd/"countries").filter(_.segments.toList.last matches "a.*\\.foo")
will return this:
ArraySeq(
/.../countries/brasil/a_whatever.foo,
/.../countries/colombia/a_something.foo)
See here for more details on how to use the os-lib.
Upvotes: 2
Reputation: 14655
I would prefer solution with Streams because you can iterate over infinite file system(Streams are lazy evaluated collections)
import scala.collection.JavaConversions._
def getFileTree(f: File): Stream[File] =
f #:: (if (f.isDirectory) f.listFiles().toStream.flatMap(getFileTree)
else Stream.empty)
Example for searching
getFileTree(new File("c:\\main_dir")).filter(_.getName.endsWith(".scala")).foreach(println)
Upvotes: 48
Reputation: 3709
As of Java 1.7 you all should be using java.nio. It offers close-to-native performance (java.io is very slow) and has some useful helpers
But Java 1.8 introduces exactly what you are looking for:
import java.nio.file.{FileSystems, Files}
import scala.collection.JavaConverters._
val dir = FileSystems.getDefault.getPath("/some/path/here")
Files.walk(dir).iterator().asScala.filter(Files.isRegularFile(_)).foreach(println)
You also asked for file matching. Try java.nio.file.Files.find
and also java.nio.file.Files.newDirectoryStream
See documentation here: http://docs.oracle.com/javase/tutorial/essential/io/walk.html
Upvotes: 35
Reputation: 1
You can use tail recursion for it:
object DirectoryTraversal {
import java.io._
def main(args: Array[String]) {
val dir = new File("C:/Windows")
val files = scan(dir)
val out = new PrintWriter(new File("out.txt"))
files foreach { file =>
out.println(file)
}
out.flush()
out.close()
}
def scan(file: File): List[File] = {
@scala.annotation.tailrec
def sc(acc: List[File], files: List[File]): List[File] = {
files match {
case Nil => acc
case x :: xs => {
x.isDirectory match {
case false => sc(x :: acc, xs)
case true => sc(acc, xs ::: x.listFiles.toList)
}
}
}
}
sc(List(), List(file))
}
}
Upvotes: 0
Reputation: 50506
No-one has mentioned yet https://github.com/pathikrit/better-files
val dir = "src"/"test"
val matches: Iterator[File] = dir.glob("**/*.{java,scala}")
// above code is equivalent to:
dir.listRecursively.filter(f => f.extension ==
Some(".java") || f.extension == Some(".scala"))
Upvotes: 9
Reputation: 50506
for (file <- new File("c:\\").listFiles) { processFile(file) }
http://langref.org/scala+java/files
Upvotes: 20
Reputation: 4846
It seems nobody mentions the scala-io
library from scala-incubrator...
import scalax.file.Path
Path.fromString("c:\temp") ** "a*.foo"
Or with implicit
import scalax.file.ImplicitConversions.string2path
"c:\temp" ** "a*.foo"
Or if you want implicit
explicitly...
import scalax.file.Path
import scalax.file.ImplicitConversions.string2path
val dir: Path = "c:\temp"
dir ** "a*.foo"
Documentation is available here: http://jesseeichar.github.io/scala-io-doc/0.4.3/index.html#!/file/glob_based_path_sets
Upvotes: 1
Reputation: 9319
The simplest Scala-only solution (if you don't mind requiring the Scala compiler library):
val path = scala.reflect.io.Path(dir)
scala.tools.nsc.io.Path.onlyFiles(path.walk).foreach(println)
Otherwise, @Renaud's solution is short and sweet (if you don't mind pulling in Apache Commons FileUtils):
import scala.collection.JavaConversions._ // enables foreach
import org.apache.commons.io.FileUtils
FileUtils.listFiles(dir, null, true).foreach(println)
Where dir
is a java.io.File:
new File("path/to/dir")
Upvotes: 3
Reputation: 498
I personally like the elegancy and simplicity of @Rex Kerr's proposed solution. But here is what a tail recursive version might look like:
def listFiles(file: File): List[File] = {
@tailrec
def listFiles(files: List[File], result: List[File]): List[File] = files match {
case Nil => result
case head :: tail if head.isDirectory =>
listFiles(Option(head.listFiles).map(_.toList ::: tail).getOrElse(tail), result)
case head :: tail if head.isFile =>
listFiles(tail, head :: result)
}
listFiles(List(file), Nil)
}
Upvotes: 5
Reputation: 1883
Scala has library 'scala.reflect.io' which considered experimental but does the work
import scala.reflect.io.Path
Path(path) walkFilter { p =>
p.isDirectory || """a*.foo""".r.findFirstIn(p.name).isDefined
}
Upvotes: 3
Reputation: 458
Why are you using Java's File instead of Scala's AbstractFile?
With Scala's AbstractFile, the iterator support allows writing a more concise version of James Moore's solution:
import scala.reflect.io.AbstractFile
def tree(root: AbstractFile, descendCheck: AbstractFile => Boolean = {_=>true}): Stream[AbstractFile] =
if (root == null || !root.exists) Stream.empty
else
(root.exists, root.isDirectory && descendCheck(root)) match {
case (false, _) => Stream.empty
case (true, true) => root #:: root.iterator.flatMap { tree(_, descendCheck) }.toStream
case (true, false) => Stream(root)
}
Upvotes: -1
Reputation: 2004
How about
def allFiles(path:File):List[File]=
{
val parts=path.listFiles.toList.partition(_.isDirectory)
parts._2 ::: parts._1.flatMap(allFiles)
}
Upvotes: 3
Reputation: 16521
Apache Commons Io's FileUtils fits on one line, and is quite readable:
import scala.collection.JavaConversions._ // important for 'foreach'
import org.apache.commons.io.FileUtils
FileUtils.listFiles(new File("c:\temp"), Array("foo"), true).foreach{ f =>
}
Upvotes: 6
Reputation: 1912
This incantation works for me:
def findFiles(dir: File, criterion: (File) => Boolean): Seq[File] = {
if (dir.isFile) Seq()
else {
val (files, dirs) = dir.listFiles.partition(_.isFile)
files.filter(criterion) ++ dirs.toSeq.map(findFiles(_, criterion)).foldLeft(Seq[File]())(_ ++ _)
}
}
Upvotes: 0
Reputation: 9026
And here's a mixture of the stream solution from @DuncanMcGregor with the filter from @Rick-777:
def tree( root: File, descendCheck: File => Boolean = { _ => true } ): Stream[File] = {
require(root != null)
def directoryEntries(f: File) = for {
direntries <- Option(f.list).toStream
d <- direntries
} yield new File(f, d)
val shouldDescend = root.isDirectory && descendCheck(root)
( root.exists, shouldDescend ) match {
case ( false, _) => Stream.Empty
case ( true, true ) => root #:: ( directoryEntries(root) flatMap { tree( _, descendCheck ) } )
case ( true, false) => Stream( root )
}
}
def treeIgnoringHiddenFilesAndDirectories( root: File ) = tree( root, { !_.isHidden } ) filter { !_.isHidden }
This gives you a Stream[File] instead of a (potentially huge and very slow) List[File] while letting you decide which sorts of directories to recurse into with the descendCheck() function.
Upvotes: 3
Reputation: 10268
Here's a similar solution to Rex Kerr's, but incorporating a file filter:
import java.io.File
def findFiles(fileFilter: (File) => Boolean = (f) => true)(f: File): List[File] = {
val ss = f.list()
val list = if (ss == null) {
Nil
} else {
ss.toList.sorted
}
val visible = list.filter(_.charAt(0) != '.')
val these = visible.map(new File(f, _))
these.filter(fileFilter) ++ these.filter(_.isDirectory).flatMap(findFiles(fileFilter))
}
The method returns a List[File], which is slightly more convenient than Array[File]. It also ignores all directories that are hidden (ie. beginning with '.').
It's partially applied using a file filter of your choosing, for example:
val srcDir = new File( ... )
val htmlFiles = findFiles( _.getName endsWith ".html" )( srcDir )
Upvotes: 1
Reputation: 18177
I like yura's stream solution, but it (and the others) recurses into hidden directories. We can also simplify by making use of the fact that listFiles
returns null for a non-directory.
def tree(root: File, skipHidden: Boolean = false): Stream[File] =
if (!root.exists || (skipHidden && root.isHidden)) Stream.empty
else root #:: (
root.listFiles match {
case null => Stream.empty
case files => files.toStream.flatMap(tree(_, skipHidden))
})
Now we can list files
tree(new File(".")).filter(f => f.isFile && f.getName.endsWith(".html")).foreach(println)
or realise the whole stream for later processing
tree(new File("dir"), true).toArray
Upvotes: 11
Reputation: 12567
Scala is a multi-paradigm language. A good "scala-esque" way of iterating a directory would be to reuse an existing code!
I'd consider using commons-io a perfectly scala-esque way of iterating a directory. You can use some implicit conversions to make it easier. Like
import org.apache.commons.io.filefilter.IOFileFilter
implicit def newIOFileFilter (filter: File=>Boolean) = new IOFileFilter {
def accept (file: File) = filter (file)
def accept (dir: File, name: String) = filter (new java.io.File (dir, name))
}
Upvotes: 11
Reputation: 167911
Scala code typically uses Java classes for dealing with I/O, including reading directories. So you have to do something like:
import java.io.File
def recursiveListFiles(f: File): Array[File] = {
val these = f.listFiles
these ++ these.filter(_.isDirectory).flatMap(recursiveListFiles)
}
You could collect all the files and then filter using a regex:
myBigFileArray.filter(f => """.*\.html$""".r.findFirstIn(f.getName).isDefined)
Or you could incorporate the regex into the recursive search:
import scala.util.matching.Regex
def recursiveListFiles(f: File, r: Regex): Array[File] = {
val these = f.listFiles
val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
good ++ these.filter(_.isDirectory).flatMap(recursiveListFiles(_,r))
}
Upvotes: 124
Reputation: 7963
Take a look at scala.tools.nsc.io
There are some very useful utilities there including deep listing functionality on the Directory class.
If I remember correctly this was highlighted (possibly contributed) by retronym and were seen as a stopgap before io gets a fresh and more complete implementation in the standard library.
Upvotes: 3