Reputation: 513
How can we run the XML using apache pig. I tried with the piggybank.storage.XMLLoader function but its not working for me. Am running the pig job in local mode only. There is no errors but its not running.
Is there a way to parse the HTML pages in the apache pig.
Please help me.
thanks in advance
Upvotes: 0
Views: 865
Reputation: 136
you need to use org.apache.pig.piggybank.storage.XMLLoader() with arguments. and Xpath as well. i found this one helpful.
Upvotes: 1
Reputation: 113
Try this code:
register <PIG_HOME>/contrib/piggybank/java/piggybank.jar;
A= LOAD '/xmlfile' USING org.apache.pig.piggybank.storage.XMLLoader('<XML_tag>');
And by pig MR mode I meant running pig in MR mode and not in local mode (i.e. pig -x local).
Hope it helps.
Upvotes: 0
Reputation: 113
Please try to run the script in MR mode. Because many of the functions / operations work fine only in MR mode.
Upvotes: 0