Reputation: 869
My spark job is failing because of java.lang.OutOfMemoryError: Java heap space.
I tried playing around config params like executor-cores
, executor-memory
, num-executors
, driver-cores
, driver-memory
, spark.yarn.driver.memoryOverhead
, spark.yarn.executor.memoryOverhead
according to Ramzy's answer. Below is my configuration set
--master yarn-cluster --executor-cores 4 --executor-memory 10G --num-executors 30 --driver-cores 4 --driver-memory 16G --queue team_high --conf spark.eventLog.dir=hdfs:///spark-history --conf spark.eventLog.enabled=true --conf spark.yarn.historyServer.address=xxxxxxxxx:xxxx --conf spark.sql.tungsten.enabled=true --conf spark.ui.port=5051 --conf spark.sql.shuffle.partitions=30 --conf spark.yarn.driver.memoryOverhead=1024 --conf spark.yarn.executor.memoryOverhead=1400 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.sql.orc.filterPushdown=true --conf spark.scheduler.mode=FAIR --conf hive.exec.dynamic.partition=false --conf hive.exec.dynamic.partition.mode=nonstrict --conf mapreduce.fileoutputcommitter.algorithm.version=2 --conf orc.stripe.size=67108864 --conf hive.merge.orcfile.stripe.level=true --conf hive.merge.smallfiles.avgsize=2560000 --conf hive.merge.size.per.task=2560000 --conf spark.driver.extraJavaOptions='-XX:+UseCompressedOops -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps' --conf spark.executor.extraJavaOptions='-XX:+UseCompressedOops -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC'
It works sometimes and fails most of the times to above-mentioned issue. While debugging, I found the below GC logs. Can someone help me understand these logs and help me tune this job?
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill %p"
# Executing /bin/sh -c "kill 79911"...
Heap
PSYoungGen total 2330112K, used 876951K [0x00000006eab00000, 0x00000007c0000000, 0x00000007c0000000)
eden space 1165312K, 75% used [0x00000006eab00000,0x0000000720365f50,0x0000000731d00000)
from space 1164800K, 0% used [0x0000000731d00000,0x0000000731d00000,0x0000000778e80000)
to space 1164800K, 0% used [0x0000000778e80000,0x0000000778e80000,0x00000007c0000000)
ParOldGen total 6990848K, used 6990706K [0x0000000540000000, 0x00000006eab00000, 0x00000006eab00000)
object space 6990848K, 99% used [0x0000000540000000,0x00000006eaadc9c0,0x00000006eab00000)
Metaspace used 69711K, capacity 70498K, committed 72536K, reserved 1112064K
class space used 9950K, capacity 10182K, committed 10624K, reserved 1048576K
End of LogType:stdout
Upvotes: 0
Views: 807
Reputation: 354
I have encountered the intermittent memory issues while running spark in cluster, and I have discovered, this happens mainly because of following reasons:-
1)Rdd partitions might just be too large to be processed, you can decrease the partition size by increasing the number of partitions by using repartition API. This will reduce the amount of data each executor will be processing. Since you have provided 10g and 4 cores to an executor that means this executor can run 4 concurrent tasks(partitions) and those 4 tasks will share 10g memory among themselves, which precisely means just 2.5g to process one partition.
val rddWithMorePartitions = rdd.repartition(rdd.getNumPartitions*2)
2)If your usecase is computation-intensive and you are not doing any caching, then you can reduce the memory allocated for storage by tweaking below parameter.
spark.storage.memoryFraction=0.6(default)
you can change it to below-
spark.storage.memoryFraction=0.5
3)You should consider increasing the executor memory to something above 25gb.
--executor-memory 26G
Upvotes: 6