Reputation: 5832
I am following the Apache Map Reduce tutorial and I am at the point of assigning input and output directories. I created both directories here:
~/projects/hadoop/WordCount/input/
~/projects/hadoop/WordCount/output/
but when I run fs
, the file and directory are not found. I am running as ubuntu user and it owns the directories and the input file.
Based on a proposed solution below, I then tried:
Found my hdfs directory hdfs dfs -ls /
which is /tmp
I created input/ and output/ inside /tmp
with mkdir
Tried to copy local .jar to.hdfs:
hadoop fs -copyFromLocal ~projects/hadoop/WordCount/wc.jar /tmp
Received:
copyFromLocal: `~projects/hadoop/WordCount/wc.jar': No such file or directory
Any troubleshooting ideas? Thanks
Upvotes: 0
Views: 3062
Reputation: 18270
MapReduce expects the Input
and Output
paths to be the directories in HDFS and not local unless the Cluster is configured in Local mode. Also the Input directory must exist and the Output should not.
For example:
If Input is /mapreduce/wordcount/input/
, this directory must be created with all the input files in it. Use HDFS commands to create them.
hdfs dfs -mkdir -p /mapreduce/wordcount/input/
hdfs dfs -copyFromLocal file1 file2 file3 /mapreduce/wordcount/input/
file1 file2 file3
are locally available input files
And if the Output is /examples/wordcount/output/
. The parent directories must exist but not the output/
directory. Hadoop creates it on the job execution.
hdfs dfs -mkdir -p /examples/wordcount/
The jar used for the job, in this case wc.jar
should reside locally and on execution provide the absolute or the relative local path to the command.
So the final command would look like
hadoop jar /path/where/the/jar/is/wc.jar ClassName /mapreduce/wordcount/input/ /examples/wordcount/output/
Upvotes: 1
Reputation: 1088
As the hadoop Invalid Input Exception suggests it can not find location "/home/ubuntu/projects/hadoop/WordCount/input".
Is it local or HDFS path? I think it is local that's why the input Exception happening.
To execute a jar file you have to put jar in the HDFS directory. And the input and output directories also have to be in HDFS.
Use copyFromLocal command to copy the jar from local to hadoop directory as:
hadoop fs -copyFromLocal <localsrc>/wc.jar hadoop-dir
Upvotes: 1