halloleo
halloleo

Reputation: 10374

For Apache Pig, how do I write a Load UDF in python

I want to write a Python UDF Load function for Apache Pig, so that I can use it in the following way in a Pig script:

register 'myudfs.py' using jython as myfuncs;
A = load 'data' using myfuncs.myLoader() as line;

The Pig documentation provides some detail for writing Load UDFs in Java, but not in Python. I have managed to implement quite useful Eval functions with Python, but I couldn't find anything about how to write Load functions in this language.

Because I have already implemented a few Eval UDFs in Python, I would like to stick to this language for all my UDFs.

Upvotes: 3

Views: 1675

Answers (1)

mr2ert
mr2ert

Reputation: 5186

Yes it is true. You can even look at the source to verify. Notice how JythonFunction extends EvalFunc not LoadFunc.

If I need to use python to handle loading the file(s) I do is something like:

register 'myudfs.py' using jython as myudfs ;

A = LOAD 'foo.bar' AS (total:chararray) ; 
B = FOREACH A GENERATE myudf.prepare_input(total) ;

To simulate a sudo-LoadFunc.

Upvotes: 3

Related Questions