schoon
schoon

Reputation: 3334

Hadoop path adding %2F

I have a file in hadoop: /home/hduser/IH/input/imageslocalpaths.txt (I've checked it is there using hadoop fs -ls IH/input/imageslocalpaths.txt). When I run:

hadoop jar IH.jar IH/input/imageslocalpaths.txt

I get:

Input path does not exist: hdfs://localhost:54310/user/hduser/IH%2Finput%2Fimageslocalpaths.txt

Can anyone tell me how to stop Hadoop changing slashes to %2F or another work around?

(I've tried the full path but hadoop just adds it on to the end of /user/hduser giving /user/hduser/user/hduser... still with %2F as well).

As rquested here is my main (do you want the other bits?)

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();
        Configuration conf2 = new Configuration();

        conf.set("fs.defaultFS", "hdfs://localhost:54310");

        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

        Job job1 = new Job(conf, "MergeImages");

        job1.setJarByClass(ImageHandlerMain.class);
        job1.setMapperClass(BinaryFilesToHadoopSequenceFileMapper.class);
        job1.setOutputKeyClass(Text.class);
        job1.setOutputValueClass(BytesWritable.class);

        FileInputFormat.addInputPath(job1, new Path(URLEncoder.encode(otherArgs[0],"UTF-8")));
        job1.setInputFormatClass(TextInputFormat.class);     

        FileOutputFormat.setOutputPath(job1, new Path(URLEncoder.encode(otherArgs[1],"UTF-8"))); //put result into intermediate folder
        job1.setInputFormatClass(TextInputFormat.class);
        job1.setOutputFormatClass(SequenceFileOutputFormat.class);
        ControlledJob cJob1 = new ControlledJob(conf);
        cJob1.setJob(job1);

        Job job2 = new Job(conf2,"FindDuplicates");

        job2.setJarByClass(ImageHandlerMain.class);
        job2.setMapperClass(ImagePHashMapper.class); 
        job2.setReducerClass(ImageDupsReducer.class);
        job2.setOutputKeyClass(Text.class);
        job2.setOutputValueClass(Text.class);        
        FileInputFormat.addInputPath(job2, new Path(URLEncoder.encode(otherArgs[1],"UTF-8") + "/part-r-00000")); //get the part-r-00000 file from the intermediate folder
        FileOutputFormat.setOutputPath(job2, new  Path(otherArgs[2])); //put result into output folder
        job2.setInputFormatClass(SequenceFileInputFormat.class);
        ControlledJob cJob2 = new ControlledJob(conf2);
        cJob2.setJob(job2);
        JobControl jobctrl = new JobControl("jobctrl");
        jobctrl.addJob(cJob1);
        jobctrl.addJob(cJob2);
        cJob2.addDependingJob(cJob1);
        jobctrl.run();


}

Upvotes: 0

Views: 595

Answers (2)

Mikel Urkia
Mikel Urkia

Reputation: 2095

I am not sure where the problem might come from, but try checking the following:

  • Check whether the url format is correct or not after parsing the parameters in

String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

  • Try creating the path without the URL encoder, like this:

FileInputFormat.setInputPaths(job, new Path(inputLocation)); //where inputLocation is just a String

Upvotes: 0

Rajen Raiyarela
Rajen Raiyarela

Reputation: 5634

Issue is in this line of code

FileInputFormat.addInputPath(job2, new Path(URLEncoder.encode(otherArgs[1],"UTF-8") + "/part-r-00000")); //get the part-r-00000 file from the intermediate folder

Here as you are using URLEncoder.encode in creating the path, it is converting "/" to %2F.

Possible workaround solution

FileInputFormat.addInputPath(job2, new Path(URLEncoder.encode(otherArgs[1],"UTF-8").replace("%2F", "/") + "/part-r-00000")); //get the part-r-00000 file from the intermediate folder

After encoding just replace back "%2F" with replace method back to "/".

Upvotes: 1

Related Questions