hadoop wordcount with java

Question

hello everyone I am very new in Hadoop. This is my first program and I need help solving the below errors.

When I put my file into HDFS as directly without using hdfs://localhost:9000/ then I get error message dir not exist.

So I put file into hdfs by using following way

hadoop fs -put file.txt  hdfs://localhost:9000/sawai.txt

after this file is loaded into the HDFS like this :

<code>File added successfully</code>

OK, then I tried to run my program of wordcount jar file like this:

hadoop jar wordcount.jar hdp.WordCount sawai.txt outputdir

and I get following error message:
```
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/hadoop_usr/sawai.txt
```

then i try another way, i try to specify the hdfs path like this.

hadoop jar wordcount.jar hdp.WordCount hdfs://localhost:9000/sawai.txt hdfs://localhost:9000/outputdir

and i get following error message:

org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/sawai.txt already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268) 
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) 
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) 
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) 
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) 
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870) 
    at hdp.WordCount.run(WordCount.java:40) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
    at hdp.WordCount.main(WordCount.java:17) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I read out many articles and they suggest me to change output directory name every time, I applied this way but its not working in my case and its seem like problem is in defining source file name on which we want to perform operations.

What is causing the exception and how can I solve it?

SparkleGoat · Accepted Answer

Have you tried hadoop jar wordcount.jar hdp.WordCount /sawai.txt /outputdir ? HDFS prefers FULL paths.

Also,I have never had to prepend "hdfs://localhost:/" to upload a file to HDFS or run a jar. Usually you can just reference the full file path and its fine. Maybe try it without that prepended?

If that does not fix it, It is best practice to increasing the replication factor to three. Also the file size is significantly smaller than the block size and that can become problematic. Cloudera Advice for file and block size http://blog.cloudera.com/blog/2009/02/the-small-files-problem

hadoop wordcount with java

Answers (2)

Related Questions