aceminer
aceminer

Reputation: 4295

Storing multiple variables in pig

I am extremely new to pig and i am not sure what to google as those results i got didnt really solve my problem.

What i have is now.

a = LOAD 'SOME_FILE.csv' using PigStorage(',') AS schema; 
C = FOREACH B GENERATE $0, $1,$2 ; 
STORE C into 'some storage' using PigStorage(';')

What i would like to do is run this through a for loop and store them in the same file.

how do i achieve this? Thanks. In other words, i have SOME_FILE.csv, SOME_FILE_1.csv, SOME_FILE_2.csv and so on. But i want to run them through the same FOREACH statement and only run one STORE statement or at least concat the results to the same output.

Sorry if i am unclear in this.

Say instead of 'SOME_FILE_*.csv', how do i write it all to the same file? In this case, the number of files i need to process are more than 3.

Thanks.

Upvotes: 0

Views: 1117

Answers (2)

Mahesh Gupta
Mahesh Gupta

Reputation: 1892

you can do in two way

 1.use glob function for uploading multiple csv in same directory from hdfs and
  1. using union

glob function
create directory in hdfs and put all SOME_FILE_*.csv in created directory in hdfs

hadoop dfs -mkdir -p /user/hduser/data

put csv in created directory in hdfs

hadoop dfs -put /location_of_file/some_files*.csv /user/hduser/data

hadoop dfs -ls /user/hduser/data

goto grunt shell of apache pig using

pig -x mapreduce

a = load '/user/hduser/data/{ SOME_FILE, SOME_FILE_1, SOME_FILE_2}.csv' using PigStorage(',') as schema;

dump a;

Upvotes: 0

Rijul
Rijul

Reputation: 1445

Assuming your input files have same schema then :

a = LOAD 'SOME_FILE.csv' using PigStorage(',') AS schema;
b = LOAD 'SOME_FILE_1.csv' USING PigStorage(',') AS schema;
c = LOAD 'SOME_FILE_2.csv' USING PigStorage(',') AS schema;

you can use UNION for concatenating your inputs

a_b_c = UNION a,b,c; 
C = FOREACH a_b_c GENERATE $0, $1,$2; 
STORE C into 'some storage' using PigStorage(';');

Upvotes: 4

Related Questions