Reputation: 11
I have two input files
Student file :
abc 30 4.5
xyz 34 9.5
def 28 6.5
klm 35 10.5
Location file :
abc hawthorne
xyz artesia
def garnet
klm vanness
My desired ouput
abc hawthorne
xyz artesia
def garnet
klm vanness
To achieve this, I wrote the following pig program.
A = LOAD '/user/hive/warehouse/students.txt' USING PigStorage(' ') AS (NAME:CHARARRAY,AGE:INT,GPA:FLOAT);
B = LOAD '/user/hive/warehouse/location.txt.txt' using PigStorage(' ') AS (NAME:CHARARRAY,LOCATION:CHARARRAY);
C = JOIN A BY NAME , B BY LOCATION USING 'replicated';
DUMP C;
The trouble is that I dont see any output message. On top of that, I see the following warnings while execution :
2014-01-22 15:18:15,829 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 2 time(s).
2014-01-22 15:18:15,829 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 2 time(s).
2014-01-22 15:18:15,829 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2014-01-22 15:18:15,829 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2014-01-22 15:18:15,832 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2014-01-22 15:18:15,832 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2014-01-22 15:18:15,841 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-01-22 15:18:15,841 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-01-22 15:18:15,841 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
Hadoop Job IDs executed by Pig: job_201401210934_0082,job_201401210934_0083
Upvotes: 0
Views: 1700
Reputation: 5634
i feel you are not seeing any output because join is not leading to any match. You are creating a join on NAME from A (abc, xyz, def, klm) & LOCATION from B (hawthorne, artesia, garnet, vanness) and if you see there are no matching strings in two data sets, so leading to no join.
Upvotes: 2