Why does Spark application in Docker container fail with OutOfMemoryError: Java heap space?

Question

I'm using an r4.8xlarge on AWS Batch Service to run Spark. This is already a big machine, 32 vCPU, and 244 GB. On AWS Batch Service the process runs inside a Docker container. Out of multiple sources, I saw that we should use java with the parameters:

-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1

Even with this parameters the process never when over 31Gb resident memory and 45 GB of virtual memory.

As analyzes I did:

java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1 -XshowSettings:vm -version
VM settings:
    Max. Heap Size (Estimated): 26.67G
    Ergonomics Machine Class: server
    Using VM: OpenJDK 64-Bit Server VM

openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

second test

docker run -it --rm 650967531325.dkr.ecr.eu-west-1.amazonaws.com/java8_aws java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -XshowSettings:vm -version
VM settings:
    Max. Heap Size (Estimated): 26.67G
    Ergonomics Machine Class: server
    Using VM: OpenJDK 64-Bit Server VM

openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

third test

java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=10 -XshowSettings:vm -version
VM settings:
    Max. Heap Size (Estimated): 11.38G
    Ergonomics Machine Class: server
    Using VM: OpenJDK 64-Bit Server VM

openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-1~deb9u1-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

The system is build with Native Packager as a standalone application. A SparkSession is built as follows with Cores equal to 31 (32-1):

SparkSession
  .builder()
  .appName(applicationName)
  .master(s"local[$Cores]")
  .config("spark.executor.memory", "3g")

Answer to egorlitvinenko:

$ docker stats
CONTAINER ID        NAME                                                                    CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
0c971993f830        ecs-marcos-BatchIntegration-DedupOrder-3-default-aab7fa93f0a6f1c86800   1946.34%            27.72GiB / 234.4GiB   11.83%              0B / 0B             72.9MB / 160kB      0
a5d6bf5522f6        ecs-agent                                                               0.19%               19.56MiB / 240.1GiB   0.01%               0B / 0B             25.7MB / 930kB      0

More tests, now with Oracle JDK, the memory never went over 4G:

$ 'spark-submit' '--class' 'integration.deduplication.DeduplicationApp' '--master' 'local[31]' '--executor-memory' '3G' '--driver-memory' '3G' '--conf' '-Xmx=150g' '/localName.jar' '--inPath' 's3a://dp-import-marcos-refined/platform-services/order/merged/*/*/*/*' '--outPath' 's3a://dp-import-marcos-refined/platform-services/order/deduplicated' '--jobName' 'DedupOrder' '--skuMappingPath' 's3a://dp-marcos-dwh/redshift/item_code_mapping'

I used the parameters -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 on my Spark, clearly not using all the memory. How can I go about this issue?

Why does Spark application in Docker container fail with OutOfMemoryError: Java heap space?

Answers (1)

Related Questions