Reputation: 393

Hadoop jar command error

While executing the JAR file command on HDFS getting error as below

#hadoop jar WordCountNew.jar WordCountNew /MRInput57/Input-Big.txt /MROutput57
15/11/06 19:46:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/11/06 19:46:32 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:8020/var/lib/hadoop-0.20/cache/mapred/mapred/staging/root/.staging/job_201511061734_0003
15/11/06 19:46:32 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /MRInput57/Input-Big.txt already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /MRInput57/Input-Big.txt already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:921)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:526)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:556)
    at MapReduce.WordCountNew.main(WordCountNew.java:114)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:197)


My Driver class Program is as below

    public static void main(String[] args) throws IOException, Exception {
        // Configutation details w. r. t. Job, Jar file
        Configuration conf = new Configuration();
        Job job = new Job(conf, "WORDCOUNTJOB");

        // Setting Driver class
        job.setJarByClass(MapReduceWordCount.class);
        // Setting the Mapper class
        job.setMapperClass(TokenizerMapper.class);
        // Setting the Combiner class
        job.setCombinerClass(IntSumReducer.class);
        // Setting the Reducer class
        job.setReducerClass(IntSumReducer.class);
        // Setting the Output Key class
        job.setOutputKeyClass(Text.class);
        // Setting the Output value class
        job.setOutputValueClass(IntWritable.class);
        // Adding the Input path
        FileInputFormat.addInputPath(job, new Path(args[0]));
        // Setting the output path
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // System exit strategy
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

Can someone please rectify the issue in my code?

Regards Pranav

Upvotes: 0

Answers (4)

naimdjon

Reputation: 3624

As others have noted, you are getting the error because the output directory already exists, most likely because you have tried executing this job before.

You can remove the existing output directory right before you run the job, i.e.:

#hadoop fs -rm -r /MROutput57 && \
hadoop jar WordCountNew.jar WordCountNew /MRInput57/Input-Big.txt /MROutput57

Upvotes: 0

Ravindra babu

Reputation: 38950

Output directory should not exist prior to execution of program. Either delete existing directory or provide new directory or remove output directory in your program.

I prefer deletion of output directory from command prompt before executing your program from command prompt.

From command prompt:

hdfs dfs -rm -r <your_output_directory_HDFS_URL>

From java:

Chris Gerken code is good enough.

Upvotes: 0

Prashant Patil

Reputation: 1

Output directory which you are trying to create to store output is already present.So try to delete the previous directory of same name or change the name of output directory.

Upvotes: 0

Chris Gerken

Reputation: 16390

You need to check that the output directory doesn't already exist and delete it if it does. MapReduce can't (or won't) write files to a directory that exists. It needs to create the directory to be sure.

Add this:

Path outPath = new Path(args[1]);
FileSystem dfs = FileSystem.get(outPath.toUri(), conf);
if (dfs.exists(outPath)) {
    dfs.delete(outPath, true);
}

Upvotes: 2

Hadoop jar command error

Answers (4)

Related Questions