Reputation: 438
I would like to parse an XML file like this in pig:
<person>
<name>person1</name>
<exp>blablabla</exp>
<exp>blablabla</exp>
</person>
<person>
<name>person2</name>
<exp>blablabla</exp>
<exp>blablabla</exp>
<exp>blablabla</exp>
</person>
I already wrote a JAVA program, which produces this output:
1,person1
2,person2
Then I can use this pig command to load the file into a variable:
A = load '...' AS (id_person:int, name:chararray);
1,1,blablabla
1,2,blablabla
2,1,blablabla
2,2,blablabla
2,3,blablabla
I load the file that way:
B = load '...' AS (id_person:int, id_exp:int, text:chararray);
I want to do the same thing, but by using only pig. Is it possible ?
Thanks
Upvotes: 1
Views: 1157
Reputation: 1177
You can use Piggybank's org.apache.pig.piggybank.storage.XMLLoader to load xml data. I'm not sure I understand what you want to achieve. If the numbers in your output are related to position within the bag (the ids and the 2nd field in the second file), then you can use datafu's bag function Enumerate ( datafu.pig.bags.Enumerate ) to enumerate the elements within the bag and then generate and store them.
Upvotes: 1