Reputation: 1104
I have two relations with defined schemas. I wish to find get only the records from relationA that do not exist in relation (see left middle visualization on this post).
I've tried the two variations below with no success as they both return the error below. How do I perform this type of operation in Pig?
"ERROR 1200 mismatched input 'WHERE' expecting SEMI-COLON."
join_result = JOIN relationA by (project_id, sequence_id) LEFT OUTER, relationB by (project_id, sequence_id) WHERE relationB (project_id, sequence_id)is null;
join_result = JOIN relationA by (project_id, sequence_id) LEFT OUTER, relationB by (project_id, sequence_id) WHERE (relationB.project_id is null) AND (relationB.sequence_id is null);
Upvotes: 2
Views: 3516
Reputation: 11080
There is no "WHERE" clause in JOIN in PIG.You will have to use FILTER for eliminating records based on a condition.
join_result = JOIN relationA by (project_id, sequence_id) LEFT OUTER, relationB by (project_id, sequence_id);
final_result = FILTER join_result BY (relationB.project_id is null AND relationB.sequence_id is null);
Upvotes: 4