Reputation: 4278
I'm working on a job in which Hive queries use R files, distributed on a cluster to be run on each node.
Like that :
ADD FILES hdfs://path/reducers/my_script.R
SET hive.mapred.reduce.tasks.speculative.execution=false;
SET mapred.reduce.tasks = 80;
INSERT OVERWRITE TABLE final_output_table
PARTITION (partition_column1, partition_column2)
SELECT selected_column1, selected_column2, partition_column1, partition_column2
FROM (
FROM
(SELECT input_column1, input_column2, input_column3
FROM input_table
WHERE partition_column1 = ${parameter1}
AND partition_column1 = ${parameter2}
distribute BY concat(input_column1, partition_column1)) mapped
REDUCE input_column1, input_column2, input_column3
USING 'my_script.R'
AS selected_column1, selected_column2
) reduced
(Hopefully there's no mistake in my reduced code, I'm quite confident there is none in my real code)
Some of the many reduce jobs succeed (17 on my last try, 58 on the previous one), some are killed (64 on the last try, 23 on the previous one), and some fail (31 the on last try, 25 on the previous one).
You'll find the full log of one of the failed reduce attempts at the bottom of the question in case it's needed, but if I'm not mistaken, here are the important parts :
Container [pid=14521, containerID=container_1508303276896_0052_01_000045] is running beyond physical memory limits.
Current usage: 3.1 GB of 3 GB physical memory used; 6.5 GB of 12 GB virtual memory used.
Killing container.
[...]
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
What I understand : what happens during the maths done in my_script.R
takes too much physical memory.
Let's assume that no improvement can be done to the code in my_script.R
, and that the way distribute
happens can't be anything else.
My question then is : what can I do to avoid taking too much memory ?
Or, maybe (since some reducers succeed) :
In case it's useful :
Average Map Time 1mins, 3sec
Average Shuffle Time 10sec
Average Merge Time 1sec
Average Reduce Time 7mins, 5sec
Full log of one of the failed reduce attempts (from the Hadoop jobs monitoring console, port 8088 and 19888) :
Container [pid=14521,containerID=container_1508303276896_0052_01_000045] is running beyond physical memory limits.
Current usage: 3.1 GB of 3 GB physical memory used; 6.5 GB of 12 GB virtual memory used.
Killing container.
Dump of the process-tree for container_1508303276896_0052_01_000045 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 15690 14650 14521 14521 (R) 5978 434 2956750848 559354 /usr/lib/R/bin/exec/R --slave --no-restore --file=/mnt/bi/hadoop_tmp/nm-local-dir/usercache/hadoop/appcache/application_1508303276896_0052/container_1508303276896_0052_01_000045/./my_script.R
|- 14650 14521 14521 14521 (java) 3837 127 3963912192 262109 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx2048m -Djava.io.tmpdir=/mnt/bi/hadoop_tmp/nm-local-dir/usercache/hadoop/appcache/application_1508303276896_0052/container_1508303276896_0052_01_000045/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/mnt/bi/hadoop_tmp/userlogs/application_1508303276896_0052/container_1508303276896_0052_01_000045 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.32.128.5 20021 attempt_1508303276896_0052_r_000014_0 45
|- 14521 20253 14521 14521 (bash) 1 2 13578240 677 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx2048m -Djava.io.tmpdir=/mnt/bi/hadoop_tmp/nm-local-dir/usercache/hadoop/appcache/application_1508303276896_0052/container_1508303276896_0052_01_000045/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/mnt/bi/hadoop_tmp/userlogs/application_1508303276896_0052/container_1508303276896_0052_01_000045 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.32.128.5 20021 attempt_1508303276896_0052_r_000014_0 45
1>/mnt/bi/hadoop_tmp/userlogs/application_1508303276896_0052/container_1508303276896_0052_01_000045/stdout
2>/mnt/bi/hadoop_tmp/userlogs/application_1508303276896_0052/container_1508303276896_0052_01_000045/stderr
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
Upvotes: 1
Views: 3729
Reputation: 9067
If your Reduce steps are borderline with just 3GB, just give them 4GB...!
set mapreduce.reduce.memory.mb = 4096 ;
Unless you are using TEZ which has a specific property for its generic hive.tez.container.size
Upvotes: 2
Reputation: 4278
Ok, I'd love more explanation, but in the meantime, here's a trial and error answer :
Upvotes: 0