Reputation: 33495
I am using Hadoop 1.0.3, Pig 0.11.0 on Ubuntu 12.04. In the part-m-00000 file in HDFS the content is as below
training@BigDataVM:~/Installations/hadoop-1.0.3$ bin/hadoop fs -cat /user/training/user/part-m-00000
1,Praveen,20,India,M
2,Prajval,5,India,M
3,Prathibha,15,India,F
I am loading it into a bag and then filtering it as below.
Users1 = load '/user/training/user/part-m-00000' as (user_id, name, age:int, country, gender);
Fltrd = filter Users1 by age <= 16;
But, when I dump the Users1 5 records are shown in the console. But, dumping Fltrd will fetch no records.
dump Fltrd;
The below warning is shown in the Pig console
2013-02-24 16:19:40,735 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 12 time(s).
Looks like I have done some simple mistake, but couldn't figure out what it is. Please help me with this.
Upvotes: 2
Views: 1371
Reputation: 10650
Since you haven't defined any load function, Pig will use PigStorage in which the default delimiter is '\t'.
If part-m-00000 is a textfile then try to set the delimiter to ',' :
Users1 = load '/user/training/user/part-m-00000' using PigStorage(',')
as (user_id, name, age:int, country, gender);
If it's a SequenceFile then have a look at Dolan's or my answer on this question.
Upvotes: 1