frazman
frazman

Reputation: 33223

Parsing complex json in pig?

I have json file in follwoing format:

{ "_id" : "foo.com", "categories" : [], "h1" : { "bar==" : { "first" : 1281916800, "last" : 1316995200 }, "foo==" : { "first" : 1281916800, "last" : 1316995200 } }, "name2" : [ "foobarl.com", "foobar2.com" ], "rep" : null }

So, how do i parse this json in pig..

also, the categories and rep can have some char in it..and might not be always empty. I made the following attempt.

a = load 'sample_json.json' using JsonLoader('id:chararray,categories:[chararray], hostt:{ (variable_a: {(first:int,last:int)})}, ns:[chararray],rep:chararray  ');

But i get this error:

org.codehaus.jackson.JsonParseException: Unexpected character ('D' (code 68)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: java.io.ByteArrayInputStream@4795b8e9; line: 1, column: 50] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291) at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385) at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306) at org.codehaus.jackson.impl.Utf8StreamParser._handleUnexpectedValue(Utf8StreamParser.java:1582) at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:386) at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:173) at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:157) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Upvotes: 3

Views: 782

Answers (1)

ajay0221
ajay0221

Reputation: 359

You can use elephant bird pig jar for parsing json. It can parse all sort of json data. Here are certain examples for parsing json via elephant bird pig using this jar. https://github.com/twitter/elephant-bird/tree/master/examples/src/main/pig

It doesn't break even if an expected json tag isn't present.

Upvotes: 3

Related Questions