Reputation: 393
While executing the JAR file command on HDFS getting error as below
#hadoop jar WordCountNew.jar WordCountNew /MRInput57/Input-Big.txt /MROutput57
15/11/06 19:46:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/11/06 19:46:32 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:8020/var/lib/hadoop-0.20/cache/mapred/mapred/staging/root/.staging/job_201511061734_0003
15/11/06 19:46:32 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /MRInput57/Input-Big.txt already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /MRInput57/Input-Big.txt already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:921)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:526)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:556)
at MapReduce.WordCountNew.main(WordCountNew.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
My Driver class Program is as below
public static void main(String[] args) throws IOException, Exception {
// Configutation details w. r. t. Job, Jar file
Configuration conf = new Configuration();
Job job = new Job(conf, "WORDCOUNTJOB");
// Setting Driver class
job.setJarByClass(MapReduceWordCount.class);
// Setting the Mapper class
job.setMapperClass(TokenizerMapper.class);
// Setting the Combiner class
job.setCombinerClass(IntSumReducer.class);
// Setting the Reducer class
job.setReducerClass(IntSumReducer.class);
// Setting the Output Key class
job.setOutputKeyClass(Text.class);
// Setting the Output value class
job.setOutputValueClass(IntWritable.class);
// Adding the Input path
FileInputFormat.addInputPath(job, new Path(args[0]));
// Setting the output path
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// System exit strategy
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Can someone please rectify the issue in my code?
Regards Pranav
Upvotes: 0
Views: 900
Reputation: 3602
As others have noted, you are getting the error because the output directory already exists, most likely because you have tried executing this job before.
You can remove the existing output directory right before you run the job, i.e.:
#hadoop fs -rm -r /MROutput57 && \
hadoop jar WordCountNew.jar WordCountNew /MRInput57/Input-Big.txt /MROutput57
Upvotes: 0
Reputation: 38910
Output directory should not exist prior to execution of program. Either delete existing directory or provide new directory or remove output directory in your program.
I prefer deletion of output directory from command prompt before executing your program from command prompt.
From command prompt:
hdfs dfs -rm -r <your_output_directory_HDFS_URL>
From java:
Chris Gerken code is good enough.
Upvotes: 0
Reputation: 1
Output directory which you are trying to create to store output is already present.So try to delete the previous directory of same name or change the name of output directory.
Upvotes: 0
Reputation: 16392
You need to check that the output directory doesn't already exist and delete it if it does. MapReduce can't (or won't) write files to a directory that exists. It needs to create the directory to be sure.
Add this:
Path outPath = new Path(args[1]);
FileSystem dfs = FileSystem.get(outPath.toUri(), conf);
if (dfs.exists(outPath)) {
dfs.delete(outPath, true);
}
Upvotes: 2