Logan
Logan

Reputation: 1371

MultiLevel JSON in PIG

I am new to PIG scripting and working with JSONs. I am in the need of parsing multi-level json files in PIG. Say,

{
     "firstName": "John",
     "lastName" : "Smith",
     "age"      : 25,
     "address"  :
     {
         "streetAddress": "21 2nd Street",
         "city"         : "New York",
         "state"        : "NY",
         "postalCode"   : "10021"
     },
     "phoneNumber":
     [
         {
           "type"  : "home",
           "number": "212 555-1234"
         },
         {
           "type"  : "fax",
           "number": "646 555-4567"
         }
     ]
 }

I am able to parse a single level json through JsonLoader() and do join and other operations and get the desired results as JsonLoader('name:chararray,field1:int .....'); Is it possible to parse the above mentioned JSON file using the built-in JsonLoader() function of PIG 0.10.0. If it is. Please explain me how it is done and accessing fields of the particular JSON?

Upvotes: 3

Views: 1722

Answers (3)

user8476617
user8476617

Reputation: 11

C = load 'path' using JsonLoader('firstName:chararray,lastName:chararray,age:int,address:(streetAddress:chararray,city:chararray,state:chararray,postalCode:chararray), phoneNumber:{(type:chararray,number:chararray)}')

Upvotes: 1

Reddevil
Reddevil

Reputation: 682

It is possible by creating your own UDF. A simple UDF example is shown in below link

http://pig.apache.org/docs/r0.9.1/udf.html#udf-java

Upvotes: 1

dranxo
dranxo

Reputation: 3388

You can handle nested json loading with Twitter's Elephant Bird: https://github.com/kevinweil/elephant-bird

a = LOAD 'file3.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')

This will parse the JSON into a map http://pig.apache.org/docs/r0.11.1/basic.html#map-schema the JSONArray gets parsed into a DataBag of maps.

Upvotes: 4

Related Questions