Reputation: 1430
I am trying to parse xml using pig (version 0.12), but getting below error:
Failed to parse: Pig script failed to parse: Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.pig.piggybank.evaluation.xml.XPath using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
My XML file is as below:
<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
<BOOK>
<TITLE>Programming Pig</TITLE>
<AUTHOR>Alan Gates</AUTHOR>
<COUNTRY>USA</COUNTRY>
<COMPANY>Horton Works</COMPANY>
<PRICE>30.90</PRICE>
<YEAR>2013</YEAR>
</BOOK>
</CATALOG>
Practcing from: http://hadoopgeek.com/apache-pig-xml-parsing-xpath/
Below is the script:
REGISTER piggybank.jar
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
A = LOAD '/hadoop_books.xml' using org.apache.pig.piggybank.storage.XMLLoader('BOOK') as (x:chararray);
B = FOREACH A GENERATE XPath(x, 'BOOK/AUTHOR'), XPath(x, 'BOOK/PRICE');
dump B;
Kindly help
I have kept .xml file in hadoop root directory
Upvotes: 0
Views: 833
Reputation: 410
I don't think you want parens in your DEFINE statement:
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath;
You can also debug by removing the DEFINE
and referencing the UDF directly:
B = FOREACH A GENERATE
org.apache.pig.piggybank.evaluation.xml.XPath(x, 'BOOK/AUTHOR'),
org.apache.pig.piggybank.evaluation.xml.XPath(x, 'BOOK/PRICE');
If that doesn't work, then piggybank.jar
is not found on your classpath and you may need to give the full path to the jar.
Upvotes: 1