Reputation: 2253
Assuming multiple files in a directory, if one passes the directory to Pig Load
A = LOAD '/SomeDir/'
it will load all the files at once(I think in any order..I'm not sure). But considering if the file names are dynamic and are also in sequence e.g. according to date, how can one call Pig load in that order? Or can the unix list directory command be used ls
?
/SomeDir$ls
20150101.csv
20150102.csv
20150104.csv
.......
#Pig load files at once while keeping the order
Upvotes: 0
Views: 630
Reputation: 8010
Pig LOAD statement is used to read the input data from specified location. suppose your pig command is:
A = load '/data/examples/file.txt';
It means you are specifying that read the data from file.txt which is available on the location /data/examples/
Suppose your pig command is:
A = load '/data/examples/';
and in the directory you have multiple file, say
20150101.csv
20150102.csv
20150104.csv
It means you are specifying that read the data from the directory which is:/data/examples/
In this case, Pig will find all files under the directory you specify and use them as input for that load statement and read will happen sequentially,starting from first file.
If the directory you specify has other directories, files in those directories will be included as well.
Below link will be useful to understand the LOAD
function in depth.
http://pig.apache.org/docs/r0.8.1/udf.html#Load+Functions
http://chimera.labs.oreilly.com/books/1234000001811/ch05.html#pl_load
http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#LOAD
Upvotes: 1