Reputation: 21

datastax spark java heap space error

Im trying to execute small calculation using scala. Im using datastax-4.6. I have 6-nodes each of 16gb RAM and 8-cores. When i try to exexute the scala program it displays the follwing error.

ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-17] shutting down ActorSystem [sparkDriver] java.lang.OutOfMemoryError: Java heap space. I have alloctaed 2-cores for each machine and executor memory is 4gb and driver memory is 4gb. Any suggestions??.

Upvotes: 0

Answers (1)

phact

Reputation: 7305

Directly quoting Russ's article on Common Spark Troubleshooting (you should read it!):

Spark Executor OOM:

How to set Memory Parameters on Spark Once a app is running the next most likely error you will see is an OOM on a spark executor. Spark is an extremely powerful tool for doing in-memory computation but it’s power comes with some sharp edges. The most common cause for an executor OOM’ing is that the application is trying to cache or load too much information into memory. Depending on your use case there are several solutions to this:

1) Increase the parallelism of your job. Try increasing the number of partitions in your job. By splitting the work into smaller sets of data less information will have to be resident in memory at a given time. For a Spark Cassandra Connector job this would mean decreasing the split size variable. The variable, spark.cassandra.input.split.size, can be set either on the command line as above or in the SparkConf object. For other RDD types look into their api’s to determine exactly how they determine partition size.

2) Increase the storage fraction variable, spark.storage.memoryFraction. This can be set as above on either the command line or in the SparkConf object. This variable sets exactly how much of the JVM will be dedicated to the caching and storage of RDD’s. You can set it as a value between 0 and 1, describing what portion of executor JVM memory will be dedicated for caching RDDs. If you have a job that will require very little shuffle memory but will utilize a lot of cached RDD’s increase this variable (example: Caching an RDD then performing aggregates on it.)

3) If all else fails you may just need additional ram on each worker. For DSE users adjust your spark-env.sh (or dse.yaml file in DSE 4.6) file to increase SPARK_MEM reserved for Spark jobs. You will need to restart your workers for these new memory limits to take effect (dse sparkworker restart.) Then increase the amount of ram the application requests by setting spark.executor.memory variable either on the command line or in the SparkConf object.

Upvotes: 1

datastax spark java heap space error

Answers (1)

Related Questions