Hadoop cannot see my input directory

Question

I am following the Apache Map Reduce tutorial and I am at the point of assigning input and output directories. I created both directories here:

~/projects/hadoop/WordCount/input/
~/projects/hadoop/WordCount/output/

but when I run fs, the file and directory are not found. I am running as ubuntu user and it owns the directories and the input file.

Based on a proposed solution below, I then tried:

Found my hdfs directory hdfs dfs -ls / which is /tmp I created input/ and output/ inside /tmp with mkdir

Tried to copy local .jar to.hdfs:

hadoop fs -copyFromLocal ~projects/hadoop/WordCount/wc.jar /tmp

Received:

copyFromLocal: `~projects/hadoop/WordCount/wc.jar': No such file or directory

Any troubleshooting ideas? Thanks

franklinsijo · Accepted Answer

MapReduce expects the Input and Output paths to be the directories in HDFS and not local unless the Cluster is configured in Local mode. Also the Input directory must exist and the Output should not.

For example:

If Input is /mapreduce/wordcount/input/, this directory must be created with all the input files in it. Use HDFS commands to create them.

hdfs dfs -mkdir -p /mapreduce/wordcount/input/
hdfs dfs -copyFromLocal file1 file2 file3 /mapreduce/wordcount/input/

file1 file2 file3 are locally available input files

And if the Output is /examples/wordcount/output/. The parent directories must exist but not the output/ directory. Hadoop creates it on the job execution.

hdfs dfs -mkdir -p /examples/wordcount/

The jar used for the job, in this case wc.jar should reside locally and on execution provide the absolute or the relative local path to the command.

So the final command would look like

hadoop jar /path/where/the/jar/is/wc.jar ClassName /mapreduce/wordcount/input/ /examples/wordcount/output/

Hadoop cannot see my input directory

Answers (2)

Related Questions