Sri
Sri

Reputation: 187

PIG load not giving correct output

I have been trying to load the file from HDFS and check the output using Dump. But I am not getting the desire output. My input file ('/results') looks like this:

1   fail

2   fail

3   pass

4   pass

5   fail

6   pass 

7   fail

8   pass

9   pass

10  pass

11  pass

12  fail

13  fail    

14  fail

15  pass

16  pass

17  pass

18  pass

19  pass

20  fail

And this the pig command I am coding:

 A = LOAD '/results' using PigStorage() as (f1:int, f2:chararray); 
 Dump A;

But I am getting the output as follows:

(1,fail)
(,)
(2,fail)
(,)
(3,pass)
(,)
(4,pass)
(,)
(5,fail)
(,)
(6,pass )
(,)
(7,fail)
(,)
(8,pass)
(,)
(9,pass)
(,)
(10,pass)
(,)
(11,pass)
(,)
(12,fail)
(,)
(13,fail)
(,)
(14,fail)
(,)
(15,pass)
(,)
(16,pass)
(,)
(17,pass)
(,)
(18,pass)
(,)
(19,pass)
(,)
(20,fail)

I really don't understand from where "(,)" has come between two tuples. Can someone help me out ?

Thanks.

Upvotes: 0

Views: 103

Answers (2)

Shantanu Fadnis
Shantanu Fadnis

Reputation: 16

You have to specify correct delimiter in PigStorage() method in order to read the file contents correctly. You need to modify that method based on the delimiter you have in your input data like:

For single space:

INPUT = LOAD '/results' USING PigStorage(' ') AS (f1: int, f2:chararray);
DUMP INPUT

For tab delimited:

INPUT = LOAD '/results' USING PigStorage('\t') AS (f1:int, f2:chararray);
DUMP INPUT;

For the second part that you are getting (,) in the output, I see an empty line between each two lines in your input data.

Solution:

Logically filter out the null records (Considering delimiter is a tab):

INPUT = LOAD '/results' USING PigStorage('\t') AS (f1: int, f2: chararray);
INPUT = FILTER INPUT BY f2 IS NOT NULL;
DUMP INPUT;

Thank You.

Upvotes: 0

nobody
nobody

Reputation: 11080

You have to specify the delimiter between the columns in your input file in PigStorage.Assuming your columns are separated by a single space

A = LOAD '/results' USING PigStorage(' ') as (f1:int, f2:chararray); 
DUMP A;

If it is a tab

A = LOAD '/results' USING PigStorage('\t') as (f1:int, f2:chararray); 
DUMP A;

Upvotes: 0

Related Questions