Reputation: 21
core-site.xml config :
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
</configuration>
hdfs-site.xml :
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/mohamed/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/mohamed/datanode/</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
mapred-site.xml :
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
exports in my ~/.bashrc :
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_HOME=/home/mohamed/hadoop-3.3.6export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOMEexport HADOOP_YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/binexport PATH=$PATH:/home/mohamed/spark-3.5.0
the $HADOOP_HOME=/home/mohamed/hadoop-3.3.6
**script of the mapper.py : **
#!/usr/bin/env python3
import sys
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print("%s\t%d" % (word, 1))
script of reducer.py :
!/usr/bin/env python3
import sys
total = 0
lastword = None
for line in sys.stdin:
line = line.strip()
word, count = line.split()
count = int(count)
if lastword is None:
lastword = word
if word == lastword:
total += count
else:
print("%s\t%d occurences" % (lastword, total))
total = count
lastword = word
the HDFS and yarn run well on their respective ports 9870 and 8088
**the command I run for my map reduce job : **
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar -input /MMdata/Overview.txt -output /results -mapper /home/mohamed/mapper.py -reducer /home/mohamed/reducer.py
once I rune this command these logs appear from my map reduce job :
2023-10-17 12:04:57,865 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/tmp/hadoop-unjar1033840378945881812/] [] /tmp/streamjob8466353576267893322.jar tmpDir=null
2023-10-17 12:04:59,228 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at Master/192.168.144.41:8032
2023-10-17 12:04:59,755 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at Master/192.168.144.41:8032
2023-10-17 12:05:00,296 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/mohamed/.staging/job_1697530620860_0027
2023-10-17 12:05:00,969 INFO mapred.FileInputFormat: Total input files to process : 1
2023-10-17 12:05:01,204 INFO mapreduce.JobSubmitter: number of splits:2
2023-10-17 12:05:54,790 WARN hdfs.DataStreamer: Slow waitForAckedSeqno took 53218ms (threshold=30000ms). File being written: /tmp/hadoop-yarn/staging/mohamed/.staging/job_1697530620860_0027/job.xml, block: BP-1651669171-192.168.162.41-1697114500534:blk_1073755253_14430, Write pipeline datanodes: [DatanodeInfoWithStorage[192.168.144.232:9866,DS-9a5dac38-b0e3-4530-a67c-b52419a0ca9f,DISK], DatanodeInfoWithStorage[192.168.144.92:9866,DS-6837ad2a-8cd2-40cf-94ad-b76aecc76d4d,DISK], DatanodeInfoWithStorage[192.168.144.74:9866,DS-71881df1-f738-449a-bb3a-9fe2bf0f75d1,DISK]].
2023-10-17 12:05:54,795 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1697530620860_0027
2023-10-17 12:05:54,795 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-10-17 12:05:55,263 INFO conf.Configuration: found resource resource-types.xml at file:/home/mohamed/hadoop-3.3.6/etc/hadoop/resource-types.xml
2023-10-17 12:05:55,438 INFO impl.YarnClientImpl: Submitted application application_1697530620860_0027
2023-10-17 12:05:55,520 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1697530620860_0027/
2023-10-17 12:05:55,533 INFO mapreduce.Job: Running job: job_1697530620860_0027
2023-10-17 12:06:06,781 INFO mapreduce.Job: Job job_1697530620860_0027 running in uber mode : false
2023-10-17 12:06:06,784 INFO mapreduce.Job: map 0% reduce 0%
2023-10-17 12:06:25,228 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000000_0, Status : FAILED
2023-10-17 12:06:25,255 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000001_0, Status : FAILED
2023-10-17 12:06:33,508 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:463)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/home/mohamed/mapper.py": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
... 25 more
2023-10-17 12:06:40,636 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000000_1, Status : FAILED
2023-10-17 12:06:47,750 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:463)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:81)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:140)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/home/mohamed/mapper.py": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
... 25 more
2023-10-17 12:06:48,789 INFO mapreduce.Job: Task Id : attempt_1697530620860_0027_m_000001_2, Status : FAILED
2023-10-17 12:07:02,022 INFO mapreduce.Job: map 50% reduce 100%
2023-10-17 12:07:03,050 INFO mapreduce.Job: map 100% reduce 100%
2023-10-17 12:07:03,093 INFO mapreduce.Job: Job job_1697530620860_0027 failed with state FAILED due to: Task failed task_1697530620860_0027_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0
2023-10-17 12:07:03,233 INFO mapreduce.Job: Counters: 14
Job Counters
Failed map tasks=7
Killed map tasks=1
Killed reduce tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=94672
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=94672
Total vcore-milliseconds taken by all map tasks=94672
Total megabyte-milliseconds taken by all map tasks=96944128
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
2023-10-17 12:07:03,235 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
like what my directory is not found even if i precise it and elevate all the permission restrictions on it by using chmod 777. I'm using the Ubuntu 22.04 and hadoop-3.3.6. by the way I made a research on ChatGPT but the answer was that it's likely my path to files mapper and reducer are not correct, but they are correct and exist in /home/mohamed.
Please any HELP.
Thank you all.
I'm a new user of hadoop distribution and I work on a simple example of map and reduce Job. But once I execute the command it won't work. To let you understand what I did here is all the configuration and mapper and reducer in python Script. Pleas if any one can help me resolve this problem.
Upvotes: 2
Views: 110