Shivanand Pawar
Shivanand Pawar

Reputation: 547

I am getting the executor running beyond memory limits when running big join in spark

I am getting the following error in the driver of a big join on spark.

We have 3 nodes with 32GB of ram and total input size of join is 150GB. (The same app is running properly when input file size is 50GB)

I have set storage.memoryFraction to 0.2 and shuffle.memoryFraction to 0.2. But still keep on getting the running beyong physical limits error.

15/04/07 19:58:17 INFO yarn.YarnAllocator: Container marked as failed: container_1426882329798_0674_01_000002. Exit status: 143. Diagnostics: Container [pid=51382,containerID=container_1426882329798_0674_01_000002] is running beyond physical memory limits. Current usage: 16.1 GB of 16 GB physical memory used; 16.8 GB of 33.6 GB virtual memory used. Killing container. Dump of the process-tree for container_1426882329798_0674_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 51387 51382 51382 51382 (java) 717795 50780 17970946048 4221191 /usr/jdk64/jdk1.7.0_45/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms14336m -Xmx14336m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+StartAttachListener -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dlog4j.configuration=file:/softwares/log4j.properties -Djava.io.tmpdir=/hadoop/yarn/local/usercache/hdfs/appcache/application_1426882329798_0674/container_1426882329798_0674_01_000002/tmp -Dspark.driver.port=20763 -Dspark.ui.port=0 -Dspark.yarn.app.container.log.dir=/hadoop/yarn/log/application_1426882329798_0674/container_1426882329798_0674_01_000002 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://[email protected]:20763/user/CoarseGrainedScheduler --executor-id 1 --hostname maxiq1.augmentiq.in --cores 4 --app-id application_1426882329798_0674 --user-class-path file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1426882329798_0674/container_1426882329798_0674_01_000002/app.jar

Please help me out with this?

Upvotes: 1

Views: 3532

Answers (2)

Shivanand Pawar
Shivanand Pawar

Reputation: 547

We have faced similar issue before. Tried changing all the configuratons of spark but no luck.

Later we found that it was the issue with the data. The key which we have used in join had multiple rows. Some of the keys were having around 4000-5000 rows in both the tables. So spark created around 5k * 5k records for that key making that executor run of memory.

You may want to check your data once. Run some profiling on input data like Group by on key and fetch the count. That may give you some insights.

Upvotes: 3

pu239ppy
pu239ppy

Reputation: 129

You could try setting --executor-memory to something below this limit. The limit is defined either in yarn-site.xml or if not set will default to well default.

You can also try to increase the limit if your nodes have more memory. Details an some sizing instructions may be found in http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/.

In general keep in mind that your resource allocation is controlled by YARN, familiarizing yourself with yarn workings and debugging is a god idea

Upvotes: -1

Related Questions