timbit
timbit

Reputation: 353

What is the cause of OutOfMemoryError in Scala?

I'm only just starting to learn Scala, coming from Python. I was attempting a basic file processing task in Scala. The task is to remove substrings like "[ ... ]" from data files using regex. The script successfully processes the first few files and then throws a java.lang.OutOfMemoryError: Java heap space error. The data file at which the error occurs is about 70MB, and I have 16GB of RAM at my disposal. (The preceding 6 files have filesize < 100Kb, with the first one as an exception: 5.5MB).

My question is: what causes the OutOfMemoryError, and how can I change my approach to prevent it from happening? I don't understand why it happens. I have little experience in debugging memory errors, as Python is relatively forgiving in memory management.

Any additional comments on coding style or the methods I use are more than welcome - I am eager to learn.

Regexer.scala:

import scala.io.Source 
import java.io._

object Regexer {

  def main(args: Array[String]): Unit = {

    val filenames = Source.fromFile("all_files.txt").getLines()

    for (fn <- filenames) {

        val datafile:String = Source.fromFile(fn).mkString

        val new_data:String = datafile.replaceAll(raw"\[.*?\]", "")

        val file = new File(fn)         
        val bw = new BufferedWriter(new FileWriter(file))
        bw.write(new_data)
        bw.close()


    }   
  } 
}

all_files.txt is a file containing paths to all files to process (as they are located in subdirectories).

Finally, the complete error message thrown upon execution:

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
    at java.lang.StringBuilder.append(StringBuilder.java:190)
    at scala.collection.mutable.StringBuilder.appendAll(StringBuilder.scala:249)
    at scala.io.BufferedSource.mkString(BufferedSource.scala:97)
    at Regexer$$anonfun$main$1.apply(Regexer.scala:12)
    at Regexer$$anonfun$main$1.apply(Regexer.scala:10)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at Regexer$.main(Regexer.scala:10)
    at Regexer.main(Regexer.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:70)
    at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
    at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:101)
    at scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:70)
    at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
    at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:22)
    at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:39)
    at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:29)
    at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:39)
    at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:65)
    at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:87)
    at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:98)
    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:103)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Upvotes: 2

Views: 7388

Answers (3)

jlncrnt
jlncrnt

Reputation: 46

To add to puhlen answer, you can read a file line by line with :

import scala.io.Source
for(line <- Source.fromPath("myfile.txt").getLines())

Upvotes: 1

puhlen
puhlen

Reputation: 8529

You might have 16Gib on your computer, but that doesn't mean the JVM can use all of that. Scala code (normally) runs in the Java Virtual Machine (JVM), which has its own memory. The default amount of memory you have available might be too low for your program. The maximum available memory for you process can be set with the -Xmx option. Try something like java -Xmx1024m Regexer or java -Xmx2g Regexer or however much memory you think should work. If you still get the problem after adding upping the available memory needed to process the files, then you either have some memory leak going on, or your algorithm needs to be optimized.

In your specific case, instead of loading the entire file into memory, consider processing line by line, or some other buffer amount, so that at any time you only need to keep a small portion of the file in memory

Upvotes: 9

Nagarjuna Pamu
Nagarjuna Pamu

Reputation: 14825

Don't try to load the file completely

val datafile:String = Source.fromFile(fn).mkString //this should be the culprit.

Also try to increase the heap size of the JVM in case processing line by line is not possible.

Upvotes: 2

Related Questions