Duncan
Duncan

Reputation: 10291

Pig output as XML

Wondered if anyone else had come across this problem, and how it is solved.

My Pig script "needs" to output as XML. The main body builds up XML as follows:

<Item><Val1>abc</Val1><Val2>qwe</Val2></Item>

<Item><Val1>tre</Val1><Val2>bnm</Val2></Item>

The problem with this is it isn't valid XML. I need to wrap this like:

<Items>
<Item>...</Item>
</Items>

But how can this be done in Pig/Hadoop? The output files are split out across multiple part-XXXXX files, so this can only be done on the merge.

Or maybe XML is completely the wrong approach, and it's always JSON!

Thanks

Duncan

Upvotes: 0

Views: 749

Answers (1)

seedhead
seedhead

Reputation: 3805

Here's one possible solution. You could do a GROUP ALL immediately before your STORE to ensure only one part-XXXXX file is output, this would let you wrap your entire XML block with the desired <Items> tag.

Upvotes: 1

Related Questions