Reputation: 1029
I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using
A= LOAD '<file path>' USING PigStorage('\u0001') as '<schema-file path>'
but get an error.
What is the syntax for correctly loading the file?
The schema file format is something like:
data1 - complex - - - - format - -
data1 event_type - - - - - long - "ends '\001'"
data1 event_id - - - - - varchar(50) - "ends '\001'"
data1 name_format - - - - - varchar(10) - "ends newline"
Upvotes: 5
Views: 7679
Reputation: 5260
It's possible to load data with schema file.
When you store your data with the '-schema'
flag, in the output path, there is .pig-schema
file that hold json with the schema.
You can use it when loading data
B = LOAD '<>' USING PigStorage(',','-schema');
You can see the schema by running
describe A;
Check this good post for more details.
This feature is available beginning with Pig 0.10.
Upvotes: 7
Reputation: 3284
The AS clause is for specifying the schema directly not the path to the schema file.
A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';
Alternatively, a file named .pig_schema
containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:
{"fields":[
{"name":"type","type":55,"description":"Fu","schema":null},
{"name":"id","type":15,"description":"Bar","schema":null},
{"name":"nameFormat","type":55,"description":"Xu","schema":null},
] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}
This file is also generated if you specify the -schema option when storing with PigStorage.
Upvotes: 6