Reputation: 1371
I am new to PIG scripting and working with JSONs. I am in the need of parsing multi-level json files in PIG. Say,
{
"firstName": "John",
"lastName" : "Smith",
"age" : 25,
"address" :
{
"streetAddress": "21 2nd Street",
"city" : "New York",
"state" : "NY",
"postalCode" : "10021"
},
"phoneNumber":
[
{
"type" : "home",
"number": "212 555-1234"
},
{
"type" : "fax",
"number": "646 555-4567"
}
]
}
I am able to parse a single level json through JsonLoader() and do join and other operations and get the desired results as JsonLoader('name:chararray,field1:int .....'); Is it possible to parse the above mentioned JSON file using the built-in JsonLoader() function of PIG 0.10.0. If it is. Please explain me how it is done and accessing fields of the particular JSON?
Upvotes: 3
Views: 1722
Reputation: 11
C = load 'path' using JsonLoader('firstName:chararray,lastName:chararray,age:int,address:(streetAddress:chararray,city:chararray,state:chararray,postalCode:chararray), phoneNumber:{(type:chararray,number:chararray)}')
Upvotes: 1
Reputation: 682
It is possible by creating your own UDF. A simple UDF example is shown in below link
http://pig.apache.org/docs/r0.9.1/udf.html#udf-java
Upvotes: 1
Reputation: 3388
You can handle nested json loading with Twitter's Elephant Bird: https://github.com/kevinweil/elephant-bird
a = LOAD 'file3.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')
This will parse the JSON into a map http://pig.apache.org/docs/r0.11.1/basic.html#map-schema the JSONArray gets parsed into a DataBag of maps.
Upvotes: 4