Isvara
Isvara

Reputation: 3463

Why does this function run out of memory?

It's a function to find the third largest of a collection of integers. I'm calling it like this:

val lineStream = thirdLargest(Source.fromFile("10m.txt").getLines.toIterable
val intStream = lineStream map { s => Integer.parseInt(s) }
thirdLargest(intStream) 

The file 10m.txt contains 10 million lines with a random integer on each one. The thirdLargest function below should not be keeping any of the integers after it has tested them, and yet it causes the JVM to run out of memory (after about 90 seconds in my case).

def thirdLargest(numbers: Iterable[Int]): Option[Int] = {
    def top3of4(top3: List[Int], fourth: Int) = top3 match {
        case List(a, b, c) =>
            if (fourth > c) List(b, c, fourth)
            else if (fourth > b) List(b, fourth, c)
            else if (fourth > a) List(fourth, b, c)
            else top3
    }

    @tailrec
    def find(top3: List[Int], rest: Iterable[Int]): Int = (top3, rest) match {
        case (List(a, b, c), Nil) => a
        case (top3, d #:: rest) => find(top3of4(top3, d), rest)
    }

    numbers match {
        case a #:: b #:: c #:: rest => Some(find(List[Int](a, b, c).sorted, rest))
        case _ => None
    }
}

Upvotes: 0

Views: 292

Answers (1)

coffeesnake
coffeesnake

Reputation: 1193

The OOM error has nothing to do with the way you read the file. It is totally fine and even recommended to use Source.getLines here. The problem is elsewhere.

Many people are being confused by the nature of Scala Stream concept. In fact this is not something you would want to use just to iterate over things. It is lazy indeed however it doesn't discard previous results – they're being memoized so there's no need to recalculate them again on the next use (which never happens in your case but that's where your memory goes). See also this answer.

Consider using foldLeft. Here's a working (but intentionally simplified) example for illustration purposes:

val lines = Source.fromFile("10m.txt").getLines()

print(lines.map(_.toInt).foldLeft(-1 :: -1 :: -1 :: Nil) { (best3, next) =>
  (next :: best3).sorted.reverse.take(3)
})

Upvotes: 2

Related Questions