Reputation: 412
I'm building a custom output format for hadoop and was wondering if there is a way in the output format to know when all reducers (RecordWriters) are complete ?
In order to know that one RecordWriter completed, the close method of RecordWriter can be used, but what about executing some cleanup when all of the RecordWriters complete ?
Upvotes: 1
Views: 414
Reputation: 3154
You can use the driver itself to do the final clean up instead of relying on the OutputFormat
. I doubt if it really provides such a feature(api). The finalize
method may be the last resort, but not advisable at all.
The waitForCompletion
method of Job
returns only after the jobs finishes. So simply do it as :
boolean status = job.waitForCompletion(true);
if(status){
// clean up required for successful jobs
} else {
// clean up required for failed jobs
}
If your clean up is irrelevant to the job's success/failure, just remove the if-else
part. And if you really need a method in your OutputFormat
class to do the deletion, make it static
. eg :
job.waitForCompletion(true);
CustomOutputFormat.cleanUp();
I hope this should suffice your need.
Upvotes: 1