sawai singh
sawai singh

Reputation: 57

hadoop wordcount with java

hello everyone I am very new in Hadoop. This is my first program and I need help solving the below errors.

When I put my file into HDFS as directly without using hdfs://localhost:9000/ then I get error message dir not exist.

So I put file into hdfs by using following way

hadoop fs -put file.txt  hdfs://localhost:9000/sawai.txt

after this file is loaded into the HDFS like this :

<code>File added successfully</code>

  1. OK, then I tried to run my program of wordcount jar file like this:

    hadoop jar wordcount.jar hdp.WordCount sawai.txt outputdir

    and I get following error message:

    org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/hadoop_usr/sawai.txt
    
  2. then i try another way, i try to specify the hdfs path like this.

    hadoop jar wordcount.jar hdp.WordCount hdfs://localhost:9000/sawai.txt hdfs://localhost:9000/outputdir

    and i get following error message:

    org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/sawai.txt already exists
        at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268) 
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) 
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) 
        at java.security.AccessController.doPrivileged(Native Method) 
        at javax.security.auth.Subject.doAs(Subject.java:422) 
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) 
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) 
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) 
        at java.security.AccessController.doPrivileged(Native Method) 
        at javax.security.auth.Subject.doAs(Subject.java:422) 
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) 
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) 
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870) 
        at hdp.WordCount.run(WordCount.java:40) 
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
        at hdp.WordCount.main(WordCount.java:17) 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:498) 
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    

I read out many articles and they suggest me to change output directory name every time, I applied this way but its not working in my case and its seem like problem is in defining source file name on which we want to perform operations.

What is causing the exception and how can I solve it?

Upvotes: 0

Views: 438

Answers (2)

SparkleGoat
SparkleGoat

Reputation: 513

Have you tried hadoop jar wordcount.jar hdp.WordCount /sawai.txt /outputdir ? HDFS prefers FULL paths.

Also,I have never had to prepend "hdfs://localhost:/" to upload a file to HDFS or run a jar. Usually you can just reference the full file path and its fine. Maybe try it without that prepended?

If that does not fix it, It is best practice to increasing the replication factor to three. Also the file size is significantly smaller than the block size and that can become problematic. Cloudera Advice for file and block size http://blog.cloudera.com/blog/2009/02/the-small-files-problem

Upvotes: 1

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29165

I havent seen your complete program with input/output....

I think sawai.txt is your input file which you want to count number of words. why are you copying that to output ?

However, See this example add it to driver. If path exists then it deletes. so you wont get FileAlreadyExistsException

   /*Provides access to configuration parameters*/
    Configuration conf = new Configuration();
    /*Creating Filesystem object with the configuration*/
    FileSystem fs = FileSystem.get(conf);
    /*Check if output path (args[1])exist or not*/
    if(fs.exists(new Path(args[1]))){
       /*If exist delete the output path*/
       fs.delete(new Path(args[1]),true);
    }

Upvotes: 1

Related Questions