Reputation: 195
In Hadoop I have a Reducer that looks like this to transform data from a prior mapper into a series of files of a non InputFormat
compatible type.
protected void setup(Context context) {
LocalDatabase ld = new LocalDatabase("localFilePath");
}
protected void reduce(BytesWritable key, Text value, Context context) {
ld.addValue(key, value)
}
protected void cleanup(Context context) {
saveLocalDatabaseInHDFS(ld);
}
I was rewriting my application in Pig, and can't figure out how this would be done in a Pig UDF as there's no cleanup function or anything else to denote when the UDF has finished running. How can this be done in pig?
Upvotes: 2
Views: 227
Reputation: 46
If you want something to run at the end of your UDF, use the finish() call. This will be called after all records have been processed by your UDF. It will be called once per mapper or reducer, the same as the cleanup call in your reducer.
Upvotes: 1
Reputation: 30089
I would say you'd need to write a StoreFunc
UDF, wrapping your own custom OutputFormat - then you'd have the ability to close out in the Output Format's RecordWriter.close()
method.
This will create an database in HDFS for each reducer however, so if you want everything in a single file, you'd need to run with a single reducer or run a secondary step to merge the databases together.
Upvotes: 2