Reputation: 51
I am trying to run our pipelines on EMR as steps but I am stuck on this step.
"logPath" : "s3://ekin-logs/",
"masterInstanceType" : "m5.xlarge",
"slaveInstanceType" : "m5.xlarge",
"instanceCount" : 2,
"subnetIds" : $SUBNET_ID,
"ec2KeyName" : "ekin-analytics",
"applications" : ["Spark","Hadoop"],
"args" : [
"spark-submit",
"--master", "yarn",
"--executor-memory", "8G",
"--driver-memory", "7G",
"--deploy-mode","cluster",
"--class","com.testinium.analytics.AppCommonDataSource",
"--conf","spark.eventLog.enabled=true",
"s3://analytics-emr-test/ekin-spark-app.jar",
"--prefixOutputDir", "hdfs:///home/hadoop/data/customer",
"--maxTimeGapThreshold","180000",
"--domainId", "13",
"--submitId", "1",
"--startTime" ,"1543664538237",
"--endTime", "1551994119153"
],
"jar" : "command-runner.jar",
"name" : "AppCommonDataSource",
"actionOnFailure" : "CANCEL_AND_WAIT"
At first, I was getting the error below:
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 10.4 GB of 8.3 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
I researched on Stack Overflow and found that people solved this problem by adding the yarn.nodemanager.vmem-check-enabled
parameter to their yarn-site.xml.(stackoverflow solution)
I also added this parameter but nothing changed.
The yarn-site parameters of my cluster:
yarnProperties.put("yarn.scheduler.maximum-allocation-mb", 10240);
yarnProperties.put("yarn.nodemanager.resource.memory-mb", 10240);
yarnProperties.put("yarn.nodemanager.vmem-check-enabled", "false");
yarnProperties.put("yarn.nodemanager.pmem-check-enabled", "false");
And this error below is what I get lastly:
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container from a bad node: container_1591270643256_0002_01_000002 on host: ip-172-31-35-232.eu-west-1.compute.internal. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
Note: It sometimes works with only a master node.
Upvotes: 1
Views: 15583
Reputation: 780
Add EBS volumes to your nodes. M5 instances doesn't come with any storage.
Upvotes: 0