Reputation: 111
Pardon me if I am using wrong standard pig names as I am new to it.
I have 2 dataframes in Pig (X and Y). Both having variables: j1 and j2. I am doing below operations:
A = JOIN X by (j1) left outer, Y by (j1);
SPLIT A into B (IF Y::j1 IS NULL), C otherwise;
D = FOREACH B GENERATE X::j2;
Here, if we do DUMP B then there is no data inside it. If we do DUMP C then data d1 appears. But when I do DUMP D then the same data d1 appears, which is wired because B did not have any data points.
Can someone tell why it is happening?
NOTE: I have tried:
Storing B and then looking manually into part files but nothing is there in B.
I have also Stored A, then came out of session and then started the session in grunt and Loaded A and then executed last 2 lines of the code (i.e split and foreach). And when I am doing this then the code is working as expected and DUMP D is not showing any output data (which is correct).
FOUND THE SOLUTION: It was not actually the Pig issue. It was Jar issue which I was using to read data and hence creating data frames X and Y. Basically Jar was not able to read the csv file properly which is creating issue in join operation above.
Upvotes: 1
Views: 41