BenH
BenH

Reputation: 2120

How do you report progress from a Python UDF (for Pig)?

Here are instructions for a Java UDF, but I'd like to do this from a Python UDF.

Upvotes: 1

Views: 320

Answers (1)

mr2ert
mr2ert

Reputation: 5186

You can try to get an instance of PigProgressable:

myudf.py

from time import sleep
from org.apache.pig.tools.pigstats import PigStatusReporter

@outputSchema('i:int')
def tester(foo):
    # Sleeps for a total of 3 minutes

    e = PigStatusReporter.getInstance()
    e.progress()
    sleep(60)
    e.progress()
    sleep(60)
    e.progress()
    sleep(60)
    e.progress()

    return 1

myscript.pig

-- Waits for 1.6 minutes before killing the job
SET mapred.task.timeout 100000 ;

register 'myudf.py' using jython as myudf ;
A = LOAD '$input' AS (foo:chararray) ;
B = FOREACH A GENERATE myudf.tester(foo) ;

This example will only succeed if e.progress() is actually sending out a heartbeat, otherwise it will timeout. This test passes for me on pig 0.10.

Upvotes: 1

Related Questions