hadoop close file written on local disk

Question

I recently start to learn hadoop. Now, I want to open a file in local disk and write some data to that file in reduce function but I couldn't find a good way to close that file.

As far as I know, closing and re-opening it is not a good idea, so I don't want to do that.

public class MyClass extends Configured implements Tool{
    main(){
         //all configurations here
         job.setMapperClass(MyMapper.class);
         job.setReducerClass(MyReducer.class);
    }
    static class MyMapper extends Mapper {
      //does something
    }
    static class MyReducer extends Reducer {
         //create file, filewriter etc here
         public MyReducer() {
              //open a file here
         }
         public reduce(){
              //write to file here
              bw.write("entered the reduce task for " + key); 
              while(there is more item)
                  bw.write( value + " will be written to my file 
");
         }
    }
}

The work flow is going to be like the following(correct me if I'm wrong):

for(each reduce task)
    write to file "entered the reduce task for " + *key*
        for each *value* for that *key*
            write *value*

I want to write key/values pair to myfile written on local disk, then want to close the file but I can't find a good solution to that problem. Or will it be a problem, if I don't close the file, I mean, is hadoop taking care of that?

Thanks,

Chris White · Accepted Answer

Both the mapper and reducer classes you're extending have methods to run code before and after your process the data.

To run code prior to your map/reduce running, extend the setup(Context context) method
To run code after your map/reduce task has finished, extend the cleanup(Context context) method

So in your case you can extend the close method to close the file out. (you'll need to maintain a instance variable in the reducer to the open stream).

Note that upon failure / exception in your reduce method, your close method will not be called (unless you override the reduce method itself to trap exceptions, run the close method and then re-throw the exception).

hadoop close file written on local disk

Answers (1)

Related Questions