Bhuvi007
Bhuvi007

Reputation: 111

Pig null data frame is generating data when used inside 'foreach' clause

Pardon me if I am using wrong standard pig names as I am new to it.

I have 2 dataframes in Pig (X and Y). Both having variables: j1 and j2. I am doing below operations:

A = JOIN X by (j1) left outer, Y by (j1);

SPLIT A into B (IF Y::j1 IS NULL), C otherwise;

D = FOREACH B GENERATE X::j2;

Here, if we do DUMP B then there is no data inside it. If we do DUMP C then data d1 appears. But when I do DUMP D then the same data d1 appears, which is wired because B did not have any data points.

Can someone tell why it is happening?

NOTE: I have tried:

  1. Storing B and then looking manually into part files but nothing is there in B.

  2. I have also Stored A, then came out of session and then started the session in grunt and Loaded A and then executed last 2 lines of the code (i.e split and foreach). And when I am doing this then the code is working as expected and DUMP D is not showing any output data (which is correct).

FOUND THE SOLUTION: It was not actually the Pig issue. It was Jar issue which I was using to read data and hence creating data frames X and Y. Basically Jar was not able to read the csv file properly which is creating issue in join operation above.

Upvotes: 1

Views: 41

Answers (0)

Related Questions