Reputation: 4191
I recently start to learn hadoop. Now, I want to open a file in local disk and write some data to that file in reduce function but I couldn't find a good way to close that file.
As far as I know, closing and re-opening it is not a good idea, so I don't want to do that.
public class MyClass extends Configured implements Tool{
main(){
//all configurations here
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
}
static class MyMapper extends Mapper <LongWritable,Text,Text,Text>{
//does something
}
static class MyReducer extends Reducer <LongWritable,Text,Text,Text>{
//create file, filewriter etc here
public MyReducer() {
//open a file here
}
public reduce(){
//write to file here
bw.write("entered the reduce task for " + key);
while(there is more item)
bw.write( value + " will be written to my file \n");
}
}
}
The work flow is going to be like the following(correct me if I'm wrong):
for(each reduce task)
write to file "entered the reduce task for " + *key*
for each *value* for that *key*
write *value*
I want to write key/values pair to myfile written on local disk, then want to close the file but I can't find a good solution to that problem. Or will it be a problem, if I don't close the file, I mean, is hadoop taking care of that?
Thanks,
Upvotes: 1
Views: 238
Reputation: 30089
Both the mapper and reducer classes you're extending have methods to run code before and after your process the data.
setup(Context context)
methodcleanup(Context context)
methodSo in your case you can extend the close method to close the file out. (you'll need to maintain a instance variable in the reducer to the open stream).
Note that upon failure / exception in your reduce method, your close method will not be called (unless you override the reduce method itself to trap exceptions, run the close method and then re-throw the exception).
Upvotes: 1