Mzf
Mzf

Reputation: 5260

Pig - Load multiple files with different schema

I'm working with pig to load range of files/folder hadoop which are comma separated.(this question on how to load multiple files in pig

the problem is that each folder have different schema file (which is located out side of the folder) - is it possible to give also multi schema files?

Upvotes: 4

Views: 761

Answers (1)

JamCon
JamCon

Reputation: 2333

If your schema file is located outside the folder, then you have to declare the schema when you perform the load.

For example:

dataset_A = LOAD '/data/A' using PigStorage('\t') as (id:int, project:chararray, org:chararray); 
dataset_B = LOAD '/data/B' using PigStorage(',') as (id:int, beta:chararray, delta:chararray, echo:int);



If you had a declared schema in a .pig_schema file within the directory, you would only have to perform the load, without having to declare the schema.

dataset_A = LOAD '/data/A' using PigStorage('\t'); 
dataset_B = LOAD '/data/B' using PigStorage(',');



/data/A/.pig_schema:

{"fields":
    [{"name":"id","type":10,"description":"autogenerated from Pig Field Schema","schema":null},
    {"name":"project","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
    {"name":"org","type":55,"description":"autogenerated from Pig Field Schema","schema":null}],
    "version":0,"sortKeys":[],"sortKeyOrders":[]}



/data/B/.pig_schema:

{"fields":
[{"name":"id","type":10,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"beta","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"delta","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"echo","type":10,"description":"autogenerated from Pig Field Schema","schema":null},],
"version":0,"sortKeys":[],"sortKeyOrders":[]}

Upvotes: 1

Related Questions