venkat
venkat

Reputation: 513

Parse XML and HTML in apache pig

How can we run the XML using apache pig. I tried with the piggybank.storage.XMLLoader function but its not working for me. Am running the pig job in local mode only. There is no errors but its not running.

Is there a way to parse the HTML pages in the apache pig.

Please help me.

thanks in advance

Upvotes: 0

Views: 865

Answers (3)

Ashish Jain
Ashish Jain

Reputation: 136

you need to use org.apache.pig.piggybank.storage.XMLLoader() with arguments. and Xpath as well. i found this one helpful.

Upvotes: 1

Gargi
Gargi

Reputation: 113

Try this code:

register <PIG_HOME>/contrib/piggybank/java/piggybank.jar; 
A= LOAD '/xmlfile' USING org.apache.pig.piggybank.storage.XMLLoader('<XML_tag>');

And by pig MR mode I meant running pig in MR mode and not in local mode (i.e. pig -x local).

Hope it helps.

Upvotes: 0

Gargi
Gargi

Reputation: 113

Please try to run the script in MR mode. Because many of the functions / operations work fine only in MR mode.

Upvotes: 0

Related Questions