Reputation: 5260
I'm working with pig to load range of files/folder hadoop which are comma separated.(this question on how to load multiple files in pig
the problem is that each folder have different schema file (which is located out side of the folder) - is it possible to give also multi schema files?
Upvotes: 4
Views: 761
Reputation: 2333
If your schema file is located outside the folder, then you have to declare the schema when you perform the load.
For example:
dataset_A = LOAD '/data/A' using PigStorage('\t') as (id:int, project:chararray, org:chararray);
dataset_B = LOAD '/data/B' using PigStorage(',') as (id:int, beta:chararray, delta:chararray, echo:int);
If you had a declared schema in a .pig_schema file within the directory, you would only have to perform the load, without having to declare the schema.
dataset_A = LOAD '/data/A' using PigStorage('\t');
dataset_B = LOAD '/data/B' using PigStorage(',');
/data/A/.pig_schema:
{"fields":
[{"name":"id","type":10,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"project","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"org","type":55,"description":"autogenerated from Pig Field Schema","schema":null}],
"version":0,"sortKeys":[],"sortKeyOrders":[]}
/data/B/.pig_schema:
{"fields":
[{"name":"id","type":10,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"beta","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"delta","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"echo","type":10,"description":"autogenerated from Pig Field Schema","schema":null},],
"version":0,"sortKeys":[],"sortKeyOrders":[]}
Upvotes: 1