Ranjit Jhala
Ranjit Jhala

Reputation: 1242

Iterating over the lines of a file

I'd like to write a simple function that iterates over the lines of a text file. I believe in 2.8 one could do:

def lines(filename: String) : Iterator[String] = { 
    scala.io.Source.fromFile(filename).getLines
}

and that was that, but in 2.9 the above doesn't work and instead I must do:

def lines(filename: String) : Iterator[String] = { 
    scala.io.Source.fromFile(new File(filename)).getLines()
}

Now, the trouble is, I want to compose the above iterators in a for comprehension:

for ( l1 <- lines("file1.txt"); l2 <- lines("file2.txt") ){ 
    do_stuff(l1, l2) 
}

This again, used to work fine with 2.8 but causes a "too many open files" exception to get thrown in 2.9. This is understandable -- the second lines in the comprehension ends up opening (and not closing) a file for each line in the first.

In my case, I know that the "file1.txt" is big and I don't want to suck it into
memory, but the second file is small, so I can write a different linesEager like so:

def linesEager(filename: String): Iterator[String] = 
    val buf = scala.io.Source.fromFile(new File(filename))
    val zs  = buf.getLines().toList.toIterator
    buf.close()
    zs

and then turn my for-comprehension into:

for (l1 <- lines("file1.txt"); l2 <- linesEager("file2.txt")){ 
    do_stuff(l1, l2) 
}

This works, but is clearly ugly. Can someone suggest a uniform & clean way of achieving the above. Seems like you need a way for the iterator returned by lines to close the file when it reaches the end, and this must have been happening in 2.8 which is why it worked there?

Thanks!

BTW -- here is a minimal version of the full program that shows the issue:

import java.io.PrintWriter
import java.io.File

object Fail { 

  def lines(filename: String) : Iterator[String] = { 
    val f = new File(filename)
    scala.io.Source.fromFile(f).getLines()
  }

  def main(args: Array[String]) = { 
    val smallFile = args(0)
    val bigFile   = args(1)

    println("helloworld")

    for ( w1 <- lines(bigFile)
        ; w2 <- lines(smallFile)
        ) 
    {
      if (w2 == w1){
        val msg = "%s=%s\n".format(w1, w2)
        println("found" + msg)
      }
    }

    println("goodbye")
  }

}

On 2.9.0 I compile with scalac WordsFail.scala and then I get this:

rjhala@goto:$ scalac WordsFail.scala 
rjhala@goto:$ scala Fail passwd words
helloworld
java.io.FileNotFoundException: passwd (Too many open files)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:120)
    at scala.io.Source$.fromFile(Source.scala:91)
    at scala.io.Source$.fromFile(Source.scala:76)
    at Fail$.lines(WordsFail.scala:8)
    at Fail$$anonfun$main$1.apply(WordsFail.scala:18)
    at Fail$$anonfun$main$1.apply(WordsFail.scala:17)
    at scala.collection.Iterator$class.foreach(Iterator.scala:652)
    at scala.io.BufferedSource$BufferedLineIterator.foreach(BufferedSource.scala:30)
    at Fail$.main(WordsFail.scala:17)
    at Fail.main(WordsFail.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
    at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
    at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
    at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
    at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
    at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
    at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Upvotes: 8

Views: 9717

Answers (2)

leedm777
leedm777

Reputation: 24032

scala-arm provides a great mechanism for automagically closing resources when you're done with them.

import resource._
import scala.io.Source

for (file1 <- managed(Source.fromFile("file1.txt"));
     l1 <- file1.getLines();
     file2 <- managed(Source.fromFile("file2.txt"));
     l2 <- file2.getLines()) {
  do_stuff(l1, l2)
}

But unless you're counting on the contents of file2.txt to change while you're looping through file1.txt, it would be best to read that into a List before you loop. There's no need to convert it into an Iterator.

Upvotes: 14

Heiko Seeberger
Heiko Seeberger

Reputation: 3722

Maybe you should take a look at scala-arm (https://github.com/jsuereth/scala-arm) and let the closing of the files (file input streams) happen automatically in the background.

Upvotes: 2

Related Questions