penlight
penlight

Reputation: 627

Hadoop Streaming Error No such file or directory

I study Hadoop and test Hadoop Streaming using Ruby whether my MapReduce algorithm could work without error.

So, I did next command.

hadoop jar hadoop-streaming-2.7.2.jar -files mapper.rb,reducer.rb -mapper mapper.rb -reducer reducer.rb -input test.json -output test

But, next error occurred.

dir/usercache/Kuma/appcache/application_1469093819516_0005/container_1469093819516_0005_01_000002/./mapper.rb": error=2, No such file or directory

Also, next is one job all error.

16/07/21 19:22:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/var/folders/l_/21nh6rmn6f3fn3vd55kh0b5h0000gn/T/hadoop-unjar8708804571112102348/] [] /var/folders/l_/21nh6rmn6f3fn3vd55kh0b5h0000gn/T/streamjob5933629893966751936.jar tmpDir=null
16/07/21 19:22:05 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
16/07/21 19:22:05 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
16/07/21 19:22:06 INFO mapred.FileInputFormat: Total input paths to process : 1
16/07/21 19:22:06 INFO mapreduce.JobSubmitter: number of splits:2
16/07/21 19:22:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469093819516_0005
16/07/21 19:22:06 INFO impl.YarnClientImpl: Submitted application application_1469093819516_0005
16/07/21 19:22:06 INFO mapreduce.Job: The url to track the job: http://hatanokaoruakira-no-MacBook-Air.local:8088/proxy/application_1469093819516_0005/
16/07/21 19:22:06 INFO mapreduce.Job: Running job: job_1469093819516_0005
16/07/21 19:22:13 INFO mapreduce.Job: Job job_1469093819516_0005 running in uber mode : false
16/07/21 19:22:13 INFO mapreduce.Job:  map 0% reduce 0%
16/07/21 19:22:18 INFO mapreduce.Job: Task Id : attempt_1469093819516_0005_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/private/tmp/hadoop-Kuma/nm-local-dir/usercache/Kuma/appcache/application_1469093819516_0005/container_1469093819516_0005_01_000002/./mapper.rb": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
    ... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 24 more

mapper.rb and reducer.rb exist at current directory which executed the command. wordcount MapReduce for testing is running without error, so I think that Hadoop setting is ok.

Environment
Mac El Capitan
Hadoop 2.7.2(presudo distributed mode)

Upvotes: 1

Views: 4730

Answers (1)

Ronak Patel
Ronak Patel

Reputation: 3849

files specified with -files options must be in hdfs:

command:

hadoop jar hadoop-streaming.jar -files f1.txt,f2.txt -input f1.txt -output test1

Error:

Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/f1.txt

After copying files to hdfs (in your case you will need to copy to hdfs root dir - (according to error in your log)):

hadoop fs -put f1.txt /user/cloudera
hadoop fs -put f2.txt /user/cloudera

job ran with NO errors:

[cloudera@quickstart hadoop-mapreduce]$ hadoop jar hadoop-streaming.jar -files f1.txt,f2.txt -input f1.txt -output test1
packageJobJar: [] [/usr/jars/hadoop-streaming-2.6.0-cdh5.7.0.jar] /tmp/streamjob5729321067745308196.jar tmpDir=null
16/07/23 11:36:16 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/07/23 11:36:17 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/07/23 11:36:18 INFO mapred.FileInputFormat: Total input paths to process : 1
16/07/23 11:36:18 INFO mapreduce.JobSubmitter: number of splits:1

HadoopStreaming - Making_Files_Available_to_Tasks:

The -files and -archives options allow you to make files and archives available to the tasks. The argument is a URI to the file or archive that you have already uploaded to HDFS. These files and archives are cached across jobs. You can retrieve the host and fs_port values from the fs.default.name config variable.

Note: The -files and -archives options are generic options. Be sure to place the generic options before the command options, otherwise the command will fail.

Upvotes: 2

Related Questions