Reputation: 4295
I am extremely new to pig and i am not sure what to google as those results i got didnt really solve my problem.
What i have is now.
a = LOAD 'SOME_FILE.csv' using PigStorage(',') AS schema;
C = FOREACH B GENERATE $0, $1,$2 ;
STORE C into 'some storage' using PigStorage(';')
What i would like to do is run this through a for loop and store them in the same file.
how do i achieve this? Thanks. In other words, i have SOME_FILE.csv, SOME_FILE_1.csv, SOME_FILE_2.csv and so on. But i want to run them through the same FOREACH statement and only run one STORE statement or at least concat the results to the same output.
Sorry if i am unclear in this.
Say instead of 'SOME_FILE_*.csv'
, how do i write it all to the same file? In this case, the number of files i need to process are more than 3.
Thanks.
Upvotes: 0
Views: 1117
Reputation: 1892
you can do in two way
1.use glob function for uploading multiple csv in same directory from hdfs and
glob function
create directory in hdfs and put all SOME_FILE_*.csv in created directory in hdfs
hadoop dfs -mkdir -p /user/hduser/data
put csv in created directory in hdfs
hadoop dfs -put /location_of_file/some_files*.csv /user/hduser/data
hadoop dfs -ls /user/hduser/data
goto grunt shell of apache pig using
pig -x mapreduce
a = load '/user/hduser/data/{ SOME_FILE, SOME_FILE_1, SOME_FILE_2}.csv' using PigStorage(',') as schema;
dump a;
Upvotes: 0
Reputation: 1445
Assuming your input files have same schema then :
a = LOAD 'SOME_FILE.csv' using PigStorage(',') AS schema;
b = LOAD 'SOME_FILE_1.csv' USING PigStorage(',') AS schema;
c = LOAD 'SOME_FILE_2.csv' USING PigStorage(',') AS schema;
you can use UNION for concatenating your inputs
a_b_c = UNION a,b,c;
C = FOREACH a_b_c GENERATE $0, $1,$2;
STORE C into 'some storage' using PigStorage(';');
Upvotes: 4