Reputation: 993
I am new to PIG Latin and I am trying to solve the below problem
Find number of employees having phone number with each areacode.
EMPID ADD_ID ZIP SAL PHONE DAT
Abcd411 PbcDr60264 953492 46404 111-432-4193 20150113
Abcd874 PbcDr39353 186307 29873 100-432-9164 20150728
Abcd197 PbcDr46725 306185 31908 113-432-4191 20150410
Abcd160 PbcDr77738 330533 61313 105-432-2468 20151007
Abcd327 PbcDr10034 951703 39301 109-432-9235 20150805
Abcd172 PbcDr21679 683299 71686 105-432-5616 20150908
Abcd227 PbcDr57694 876619 46743 109-432-9181 20151101
Abcd900 PbcDr80166 970136 34242 105-432-7415 20150820
Abcd318 PbcDr34711 234066 10989 101-432-9667 20150906
Abcd702 PbcDr86734 997954 97688 105-432-6592 20151026
And below is the way I am trying to solve it.
empdata = LOAD '/home/cloudera/empData.txt' as (empId:chararray, location:chararray, zipCode:long , salary:long, phone:chararray, dateOfJoin:long);
grpdata = GROUP empdata by SUBSTRING(phone, 0, INDEXOF(phone, '-' , 0));
dataCnt = foreach grpdata generate count(grpdata);
But I am not getting error stating that its:- Invalid scalar projection: grpdata : A column needs to be projected from a relation for it to be used as a scalar
And in another problem statement for same data set
Find number of employees having date of joining between 2015-01-01 to 2015-05-28.
I tried below solution , but this time I am not getting any results.
empdata = LOAD '/home/cloudera/empData.txt' as (empId:chararray, location:chararray, zipCode:long , salary:long, phone:chararray, doj:chararray);
filtDate = filter empdata by ToDate(doj, 'yyyyMMdd') >= ToDate('20150101', 'yyyymmdd') AND ToDate(doj, 'yyyyMMdd') <= ToDate('20150528', 'yyyymmdd');
Please help with explanation.
Upvotes: 0
Views: 385
Reputation: 1482
try this
empdata = LOAD '/home/cloudera/empData.txt' as using PigStorage(' ') (empId:chararray, location:chararray, zipCode:long , salary:long, phone:chararray, dateOfJoin:long);
grpdata = GROUP empdata by SUBSTRING(phone, 0, INDEXOF(phone, '-' , 0));
dataCnt = foreach grpdata generate $0, COUNT(empdata);
Upvotes: 1
Reputation: 1482
you should count empdata
dataCnt = foreach grpdata generate COUNT(empdata);
Upvotes: 0