Reputation: 655
I have some pig output files and want to read them on another machine(without hadoop installation). I just want to read a tab-seperated plain text line and parse it into a java object. I am guessing we should be able to use pig.jar as dependency and be able to read it. I could not find relevant documentation. I think this class could be used? How can we provide the schema also.
Upvotes: 0
Views: 590
Reputation: 2993
I suggest you to store data in Avro serialization format. It is Pig-independent and it allows to handle complex data structures like you described (so you don't need to write your own parser). See this article for examples.
Upvotes: 1
Reputation: 650
Your pig output files are just text files, right? Then you don't need any pig or hadoop jars. Last time i worked with Pig was on amazon's EMR platform, and the output files were stashed in an s3 bucket. They were just text files and standard java can read the file in.
That class you referenced is for reading into pig from some text format.
Are you asking for a library to parse the pig data model into java objects? I.e. the text representation of tuples & bags, etc? If so then its probably easier to write it yourself. It's a VERY simple data model with only 3 -ish datatypes..
Upvotes: 0