Sumit
Sumit

Reputation: 1430

Error while parsing PIG-XML

I am trying to parse xml using pig (version 0.12), but getting below error:

Failed to parse: Pig script failed to parse: Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.pig.piggybank.evaluation.xml.XPath using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

My XML file is as below:

<CATALOG>
<BOOK>
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
<BOOK>
<TITLE>Programming Pig</TITLE>
<AUTHOR>Alan Gates</AUTHOR>
<COUNTRY>USA</COUNTRY>
<COMPANY>Horton Works</COMPANY>
<PRICE>30.90</PRICE>
<YEAR>2013</YEAR>
</BOOK>
</CATALOG>

Practcing from: http://hadoopgeek.com/apache-pig-xml-parsing-xpath/

Below is the script:

REGISTER piggybank.jar

DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();

A =  LOAD '/hadoop_books.xml' using org.apache.pig.piggybank.storage.XMLLoader('BOOK') as (x:chararray);

B = FOREACH A GENERATE XPath(x, 'BOOK/AUTHOR'), XPath(x, 'BOOK/PRICE');


dump B;

Kindly help

I have kept .xml file in hadoop root directory

Upvotes: 0

Views: 833

Answers (1)

Brian R Armstrong
Brian R Armstrong

Reputation: 410

I don't think you want parens in your DEFINE statement:

    DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath;

You can also debug by removing the DEFINE and referencing the UDF directly:

    B = FOREACH A GENERATE 
        org.apache.pig.piggybank.evaluation.xml.XPath(x, 'BOOK/AUTHOR'),
        org.apache.pig.piggybank.evaluation.xml.XPath(x, 'BOOK/PRICE');

If that doesn't work, then piggybank.jar is not found on your classpath and you may need to give the full path to the jar.

Upvotes: 1

Related Questions