KarelV
KarelV

Reputation: 507

Iterate map and reduce operations

I'm writing a Hadoop application calculates map data at a certain resolution. My Input files are tiles of a map, named according to the QuadTile principle. I need to subsample those, and stitch those together until I have a certain higher-level tile which covers a larger area but at a lower resolution. Like zooming out in google maps.

Currently my Mapper subsamples tiles and my reducer combines tiles a a certain level and forms tiles of one level up. So for so good. But depending on which tile I need, I need to repeat those map and reduce steps a x times, which I have not been able to do so far.

What would be the best way to do so? Is it possible without explicitly saving the tiles in some temp directory and starting a new mapreduce Job on those temp dirs until I get what I want? What I think would be the perfect solution is something roughly like 'while(context.hasMoreThanOneKey()){iterate mapreduce}'.

Following an answer, I have now written a class TileJob which extends Job. However, the mapreduce is still not chained. Could you tell me what I'm doing wrong?

public boolean waitForCompletion(boolean verbose) throws IOException, InterruptedException, ClassNotFoundException{

    if(desiredkeylength != currentinputkeylength-1){            
        System.out.println("In loop, setting input at " + tempout);
        String tempin = tempout;
        FileInputFormat.setInputPaths(this, tempin);            
        tempout = (output + currentinputkeylength + "/");
        FileOutputFormat.setOutputPath(this, new Path(tempout));
        System.out.println("Setting output at " + tempout);
        currentinputkeylength--;
        Configuration conf = new Configuration();
        TileJob job = new TileJob(conf);
        job.setJobName(getJobName());
        job.setUpJob(tempin, tempout, tiletogenerate, currentinputkeylength);       
         return job.waitForCompletion(verbose);

    }else{
        //desiredkeylength == currentkeylength-1
        System.out.println("In else, setting input at " + tempout);

        String tempin = tempout;
        FileInputFormat.setInputPaths(this, tempin);            
        tempout = output;
        FileOutputFormat.setOutputPath(this, new Path(tempout));
        System.out.println("Setting output at " + tempout);
        currentinputkeylength--;
        Configuration conf = new Configuration();
        TileJob job = new TileJob(conf);
        job.setJobName(getJobName());
        job.setUpJob(tempin, tempout, tiletogenerate, currentinputkeylength);
        currentinputkeylength--;

        return super.waitForCompletion(verbose);
    }   

}

Upvotes: 2

Views: 1616

Answers (1)

Chris Gerken
Chris Gerken

Reputation: 16392

Usually you kick a mapreduce step off by having a driver class main method that configures the Job, Configuration and format type (input and output). Once everything's ready to go that main method calls Job::waitForCompletion() which submits the job and waits for the job to complete before continuing.

You can wrap some of that logic in a loop that repeatedly calls Job::waitForCompletion() until your criteria is met. You can implement your criteria using counters. Put logic into your reduce() method to set or increment a counter with the number of keys. Your loop in the driver class can get the value of that (distributed) counter from the Job instance and you code your while expression using that value.

What file locations you use is up to you. Inside this driver loop you can change the file location for the inputs and outputs, or keep them the same.

I should probably add that you ought to go ahead and create a new Job and Configuration instance inside the loop. I don't know that those objects are reusable in this situation.

public static void main(String[] args) {
    int keys = 2;
    boolean completed = true;
    while (completed & (keys > 1)) {

        Job job = new Job();

            // Do all your job configuration here

        completed = job.waitForCompletion();
        if (completed) {
            keys = job.getCounter().findCounter("Total","Keys").getValue();
        }
    }

}

Upvotes: 1

Related Questions