Slinky
Slinky

Reputation: 5832

Hadoop cannot see my input directory

I am following the Apache Map Reduce tutorial and I am at the point of assigning input and output directories. I created both directories here:

~/projects/hadoop/WordCount/input/
~/projects/hadoop/WordCount/output/

but when I run fs, the file and directory are not found. I am running as ubuntu user and it owns the directories and the input file.

Based on a proposed solution below, I then tried:

Found my hdfs directory hdfs dfs -ls / which is /tmp I created input/ and output/ inside /tmp with mkdir

Tried to copy local .jar to.hdfs:

hadoop fs -copyFromLocal ~projects/hadoop/WordCount/wc.jar /tmp

Received:

copyFromLocal: `~projects/hadoop/WordCount/wc.jar': No such file or directory

enter image description here

Any troubleshooting ideas? Thanks

Upvotes: 0

Views: 3062

Answers (2)

franklinsijo
franklinsijo

Reputation: 18270

MapReduce expects the Input and Output paths to be the directories in HDFS and not local unless the Cluster is configured in Local mode. Also the Input directory must exist and the Output should not.

For example:

If Input is /mapreduce/wordcount/input/, this directory must be created with all the input files in it. Use HDFS commands to create them.

hdfs dfs -mkdir -p /mapreduce/wordcount/input/
hdfs dfs -copyFromLocal file1 file2 file3 /mapreduce/wordcount/input/

file1 file2 file3 are locally available input files

And if the Output is /examples/wordcount/output/. The parent directories must exist but not the output/ directory. Hadoop creates it on the job execution.

hdfs dfs -mkdir -p /examples/wordcount/

The jar used for the job, in this case wc.jar should reside locally and on execution provide the absolute or the relative local path to the command.

So the final command would look like

hadoop jar /path/where/the/jar/is/wc.jar ClassName /mapreduce/wordcount/input/ /examples/wordcount/output/

Upvotes: 1

ravi
ravi

Reputation: 1088

As the hadoop Invalid Input Exception suggests it can not find location "/home/ubuntu/projects/hadoop/WordCount/input".

Is it local or HDFS path? I think it is local that's why the input Exception happening.

To execute a jar file you have to put jar in the HDFS directory. And the input and output directories also have to be in HDFS.

Use copyFromLocal command to copy the jar from local to hadoop directory as:

hadoop fs -copyFromLocal <localsrc>/wc.jar hadoop-dir

Upvotes: 1

Related Questions