ArchieTiger
ArchieTiger

Reputation: 2253

Pig load multiple sequential files

Assuming multiple files in a directory, if one passes the directory to Pig Load A = LOAD '/SomeDir/'it will load all the files at once(I think in any order..I'm not sure). But considering if the file names are dynamic and are also in sequence e.g. according to date, how can one call Pig load in that order? Or can the unix list directory command be used ls?

/SomeDir$ls

20150101.csv
20150102.csv
20150104.csv
.......

#Pig load files at once while keeping the order 

Upvotes: 0

Views: 630

Answers (1)

Sandeep Singh
Sandeep Singh

Reputation: 8010

Pig LOAD statement is used to read the input data from specified location. suppose your pig command is:

A = load '/data/examples/file.txt';

It means you are specifying that read the data from file.txt which is available on the location /data/examples/

Suppose your pig command is: A = load '/data/examples/'; and in the directory you have multiple file, say

20150101.csv
20150102.csv
20150104.csv

It means you are specifying that read the data from the directory which is:/data/examples/ In this case, Pig will find all files under the directory you specify and use them as input for that load statement and read will happen sequentially,starting from first file.

If the directory you specify has other directories, files in those directories will be included as well.

Below link will be useful to understand the LOAD function in depth.

http://pig.apache.org/docs/r0.8.1/udf.html#Load+Functions

http://chimera.labs.oreilly.com/books/1234000001811/ch05.html#pl_load

http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#LOAD

Upvotes: 1

Related Questions