Reputation: 3324
I have a Python UDF that uses lxml. My Pig job that uses the UDF fails:
File "PigParse.py", line 10, in ParseToPig ImportError: No module named lxml
The Python script works fine as a stand alone program, its line 10 is:
from lxml import etree
Do I need to distribute lxml to the hadoop cluster somehow, and if so, how and which version should I use?
I have seen examples of distributing nltk using Hadoop -file but nothing for Pig.
TIA!!!
Upvotes: 1
Views: 159
Reputation: 3324
I think my problem is because I'm using Jython:
`REGISTER 'PigParse.py' using jython as PP;
and you can't use lxml with Jython
Upvotes: 0