Reputation: 3463
It's a function to find the third largest of a collection of integers. I'm calling it like this:
val lineStream = thirdLargest(Source.fromFile("10m.txt").getLines.toIterable
val intStream = lineStream map { s => Integer.parseInt(s) }
thirdLargest(intStream)
The file 10m.txt
contains 10 million lines with a random integer on each one. The thirdLargest
function below should not be keeping any of the integers after it has tested them, and yet it causes the JVM to run out of memory (after about 90 seconds in my case).
def thirdLargest(numbers: Iterable[Int]): Option[Int] = {
def top3of4(top3: List[Int], fourth: Int) = top3 match {
case List(a, b, c) =>
if (fourth > c) List(b, c, fourth)
else if (fourth > b) List(b, fourth, c)
else if (fourth > a) List(fourth, b, c)
else top3
}
@tailrec
def find(top3: List[Int], rest: Iterable[Int]): Int = (top3, rest) match {
case (List(a, b, c), Nil) => a
case (top3, d #:: rest) => find(top3of4(top3, d), rest)
}
numbers match {
case a #:: b #:: c #:: rest => Some(find(List[Int](a, b, c).sorted, rest))
case _ => None
}
}
Upvotes: 0
Views: 292
Reputation: 1193
The OOM error has nothing to do with the way you read the file. It is totally fine and even recommended to use Source.getLines
here. The problem is elsewhere.
Many people are being confused by the nature of Scala Stream
concept. In fact this is not something you would want to use just to iterate over things. It is lazy indeed however it doesn't discard previous results – they're being memoized so there's no need to recalculate them again on the next use (which never happens in your case but that's where your memory goes). See also this answer.
Consider using foldLeft
. Here's a working (but intentionally simplified) example for illustration purposes:
val lines = Source.fromFile("10m.txt").getLines()
print(lines.map(_.toInt).foldLeft(-1 :: -1 :: -1 :: Nil) { (best3, next) =>
(next :: best3).sorted.reverse.take(3)
})
Upvotes: 2