user3627159
user3627159

Reputation: 1

how to load multiple text files in a folder in pig using load command?

I have been using this for loading one text file

A = LOAD '1try.txt' USING PigStorage(' ') as (c1:chararray,c2:chararray,c3:chararray,c4:chararray);

Upvotes: 0

Views: 13982

Answers (3)

Ani Menon
Ani Menon

Reputation: 28209

data = load '/FOLDER/PATH' using PigStorage(' ') AS (<name> <type>, ..);

OR

data = load '/FOLDER/PATH' using HBaseStorage();

Upvotes: 0

Carlos Andres Castro
Carlos Andres Castro

Reputation: 173

Here is the link to the official pig documentation that indicates that you can use the load statement to load all the files in a directory: http://pig.apache.org/docs/r0.14.0/basic.html#load

Syntax: LOAD 'data' [USING function] [AS schema];

Where: 'data': The name of the file or directory, in single quotes. If you specify a directory name, all the files in the directory are loaded.

Upvotes: 1

Andrey Sozykin
Andrey Sozykin

Reputation: 926

You can use folder name instead of file name, like this:

A = LOAD 'myfolder' USING PigStorage(' ') 
    AS (c1:chararray,c2:chararray,c3:chararray,c4:chararray);

Pig will load all files in the specified folder, as stated in Programming Pig:

When specifying a “file” to read from HDFS, you can specify directories. In this case, Pig will find all files under the directory you specify and use them as input for that load statement. So, if you had a directory input with two datafiles today and yesterday under it, and you specified input as your file to load, Pig will read both today and yesterday as input. If the directory you specify has other directories, files in those directories will be included as well.

Upvotes: 4

Related Questions