Hadoop -pseudo distributed mode : Input path does not exist

Question

I am newbie in Hadoop.. I just ran my hadoop application in a stand alone mode. It worked just fine. I now decided to move it to pseudo distributed mode. I made the configuration changes as mentioned . Snippets of my xml files are shown:

my core-site.xml looks as follows :

fs.default.name
hdfs://localhost/


    hadoop.tmp.dir
    /tmp/hadoop-onur
    A base for other temporary directories.

my hdfs-site.xml is


dfs.replication
1

and my mapred.xml is


mapred.job.tracker
localhost:8021

I ran the scripts for start-dfs.sh and start-mapred.sh and it started fine

root@vissu-desktop:/home/vissu/Raveesh/Hadoop# start-dfs.sh 
starting namenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-vissu-desktop.out
localhost: starting datanode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-vissu-desktop.out
localhost: starting secondarynamenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-vissu-desktop.out
root@vissu-desktop:/home/vissu/Raveesh/Hadoop# start-mapred.sh 
starting jobtracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-vissu-desktop.out
localhost: starting tasktracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-vissu-desktop.out
root@vissu-desktop:/home/vissu/Raveesh/Hadoop#

Now i tried to run my application: But got the following error.

root@vissu-desktop:/home/vissu/Raveesh/Hadoop/hadoop-0.20.2# hadoop jar ResultAgg_plainjar.jar ProcessInputFile /home/vissu/Raveesh/VotingConfiguration/sample.txt 
ARG 0 obtained = ProcessInputFile
12/07/17 17:43:33 INFO preprocessing.ProcessInputFile: Modified File Name is /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
Going to process map reduce jobs
12/07/17 17:43:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/07/17 17:43:34 ERROR preprocessing.ProcessInputFile: Input path does not exist: hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
root@vissu-desktop:/home/vissu/Raveesh/Hadoop/hadoop-0.20.2#

The application initially takes in a file from a path then modifies it and creates a sample.txt_modf and this file has to be used by the map reduce framework. When running in the standalone mode i had given the absolute path and hence it was fine. But i am unable to figure out what is the path is should specify in the Path api for hadoop.. If i give the file it adds the hdfs://localhost/ .. So i am unsure of how to give the path in the pseudo distributed mode.. should i simply make sure that the modified file is created in that location..

My query is on how to mention the path..

Snippet containing the path is

        KeyValueTextInputFormat.addInputPath(conf,
                new Path(System.getProperty("user.dir")+File.separator+inputFileofhits.getName()));
        FileOutputFormat.setOutputPath(
                conf,
                new Path(ProcessInputFile.resultAggProps
                        .getProperty("OUTPUT_DIRECTORY")));

Thanks

Chris White · Accepted Answer

Does this file exist in HDFS? It looks like you've provided a local path to the file (user directories in HDFS are usually rooted at /user rather than /home.

You can check the file exists in HDFS by typing:

#> hadoop fs -ls hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf

If this returns nothing, i.e. the file is not in HDFS, then you can copy to HDFS again using the hadoop fs command:

#> hadoop fs -put /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf hdfs://localhost/user/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf

Note here the path in HDFS is rooted at /user, not /home.

Hadoop -pseudo distributed mode : Input path does not exist

Answers (1)

Related Questions