schoon
schoon

Reputation: 3324

Pig Python UDF and lxml

I have a Python UDF that uses lxml. My Pig job that uses the UDF fails:

File "PigParse.py", line 10, in ParseToPig ImportError: No module named lxml

The Python script works fine as a stand alone program, its line 10 is:

from lxml import etree 

Do I need to distribute lxml to the hadoop cluster somehow, and if so, how and which version should I use?

I have seen examples of distributing nltk using Hadoop -file but nothing for Pig.

TIA!!!

Upvotes: 1

Views: 159

Answers (1)

schoon
schoon

Reputation: 3324

I think my problem is because I'm using Jython:

`REGISTER 'PigParse.py' using jython as PP;

and you can't use lxml with Jython

Upvotes: 0

Related Questions