Manny
Manny

Reputation: 195

Pig: Perform task on completion of UDF

In Hadoop I have a Reducer that looks like this to transform data from a prior mapper into a series of files of a non InputFormat compatible type.

protected void setup(Context context) {
    LocalDatabase ld = new LocalDatabase("localFilePath");
}

protected void reduce(BytesWritable key, Text value, Context context) {
    ld.addValue(key, value)
}

protected void cleanup(Context context) {
    saveLocalDatabaseInHDFS(ld);
}

I was rewriting my application in Pig, and can't figure out how this would be done in a Pig UDF as there's no cleanup function or anything else to denote when the UDF has finished running. How can this be done in pig?

Upvotes: 2

Views: 227

Answers (2)

Alan Gates
Alan Gates

Reputation: 46

If you want something to run at the end of your UDF, use the finish() call. This will be called after all records have been processed by your UDF. It will be called once per mapper or reducer, the same as the cleanup call in your reducer.

Upvotes: 1

Chris White
Chris White

Reputation: 30089

I would say you'd need to write a StoreFunc UDF, wrapping your own custom OutputFormat - then you'd have the ability to close out in the Output Format's RecordWriter.close() method.

This will create an database in HDFS for each reducer however, so if you want everything in a single file, you'd need to run with a single reducer or run a secondary step to merge the databases together.

Upvotes: 2

Related Questions