Reputation: 572

org.apache.hadoop.mapred.FileAlreadyExistsException

I was trying to run the example program in Hadoop given here

when i try the run it I get a org.apache.hadoop.mapred.FileAlreadyExistsException

emil@psycho-O:~/project/hadoop-0.20.2$ bin/hadoop jar jar_files/wordcount.jar org.myorg.WordCount jar_files/wordcount/input jar_files/wordcount/output
11/02/06 14:54:23 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/02/06 14:54:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/home/emil/project/hadoop-0.20.2/jar_files/wordcount/input already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:111)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:772)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
    at org.myorg.WordCount.main(WordCount.java:55)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
emil@psycho-O:~/project/hadoop-0.20.2$

Its from /home/emil/project/hadoop-0.20.2/jar_files/wordcount/input that I take my input files file01 and file02. When i googled i found out that this is done to prevent re-execution of same task. But in my case its the input file that is causing the exception. Is there anything wrong with my command because I don't see any posts with the same error for the wordcount problem. I am a newbie in java.

What could be the reason for this??

Upvotes: 14

Answers (5)

usr12345

Reputation: 1

Yes. I ran into the same problem. When I removed org.myorg.WordCount it worked just fine.

Edit:

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

The only input the job is expecting are input and output path

Upvotes: -1

Amir S

Reputation: 41

This is to prevent overwriting previous results. You can cleanup and delete the output path when creating and setting the job:

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    TextInputFormat.addInputPath(job,new Path(args[0]));
    FileSystem.get(conf).delete(new Path(args[1]),true);
    TextOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}

Upvotes: 4

ChucK

Reputation: 2134

I just ran into this and I found I had to do both what Sandeep and Thomas said: use args[1] and args[2] in the sample code and ensure the output directory doesn't exist, despite what the example says.

Upvotes: 3

Sandeep Mukherjee

Reputation: 211

I faced the same problem. Took me a while to figure out whats going on. The main problem was you could not attach a debugger to find out what values being passed.

you are using the args[0] as input and args[1] as output folder in your code.

Now, if you are using the new framework where you are consuming the command lines inside the run method of Tool class, args[0] is the name of the program being executed which is WordCount in this case.

args[1] is the name of the input folder you are specifying which is mapped into the output folder by the program and hence you are seeing the exception.

So the solution is:

use args[1] and args[2].

Upvotes: 21

Thomas Jungblut

Reputation: 20969

You have to delete the output directory you are giving if the job ran once.
This one should go for you.

bin/hadoop fs -rmr jar_files/wordcount/output

EDIT
I've missunderstood the creator, thought it was going about the worcount example from hadoop's example jar. Could you please provide the sourcecode in your class? org.myorg.WordCount

Upvotes: 7

org.apache.hadoop.mapred.FileAlreadyExistsException

Answers (5)

Related Questions