Reputation: 132
We have a SequenceFile of custom Writable Values objects, the object essentially equate to a complex bag data type in Pig.
Is there a convenient way that we can write a custom function to convert the hadoop Writable object to a bag data type and then process it using a pig script?
Upvotes: 2
Views: 493
Reputation: 30089
One option is to look at elephant-bird - if you scroll down this github page to the README section, it has a section about Pig:
Pig
- Includes converter interface for turning Tuples into Writables and vice versa
I've never used it, and i imagine you'll have to implement some code yourself (probably an extension of the com.twitter.elephantbird.pig.util.WritableLoadCaster
abstract class and the SequencedFileLoader
to load your sequence file using your load caster implementation
Upvotes: 3