user3204936
user3204936

Reputation: 11

Pig - Replicated Join

I have two input files

Student file :

abc 30 4.5
xyz 34 9.5
def 28 6.5
klm 35 10.5

Location file :

abc hawthorne
xyz artesia
def garnet
klm vanness

My desired ouput

abc hawthorne
xyz artesia
def garnet
klm vanness 

To achieve this, I wrote the following pig program.

A = LOAD '/user/hive/warehouse/students.txt' USING PigStorage(' ') AS (NAME:CHARARRAY,AGE:INT,GPA:FLOAT);
B = LOAD '/user/hive/warehouse/location.txt.txt' using PigStorage(' ') AS (NAME:CHARARRAY,LOCATION:CHARARRAY);
C = JOIN A BY NAME , B BY LOCATION USING 'replicated';
DUMP C;

The trouble is that I dont see any output message. On top of that, I see the following warnings while execution :

2014-01-22 15:18:15,829 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 2 time(s).
2014-01-22 15:18:15,829 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 2 time(s).
2014-01-22 15:18:15,829 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Success!
2014-01-22 15:18:15,829 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Success!
2014-01-22 15:18:15,832 [main] INFO  org.apache.pig.data.SchemaTupleBackend  - Key [pig.schematuple] was not set... will not generate code.
2014-01-22 15:18:15,832 [main] INFO  org.apache.pig.data.SchemaTupleBackend  - Key [pig.schematuple] was not set... will not generate code.
2014-01-22 15:18:15,841 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat  - Total input paths to process : 1
2014-01-22 15:18:15,841 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths to process : 1
2014-01-22 15:18:15,841 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths to process : 1
Hadoop Job IDs executed by Pig: job_201401210934_0082,job_201401210934_0083

Upvotes: 0

Views: 1700

Answers (1)

Rajen Raiyarela
Rajen Raiyarela

Reputation: 5634

i feel you are not seeing any output because join is not leading to any match. You are creating a join on NAME from A (abc, xyz, def, klm) & LOCATION from B (hawthorne, artesia, garnet, vanness) and if you see there are no matching strings in two data sets, so leading to no join.

Upvotes: 2

Related Questions