Arpan
Arpan

Reputation: 993

grouping and counting the records in PIG Latin

I am new to PIG Latin and I am trying to solve the below problem

Find number of employees having phone number with each areacode.

EMPID   ADD_ID     ZIP    SAL   PHONE        DAT
Abcd411 PbcDr60264 953492 46404 111-432-4193 20150113
Abcd874 PbcDr39353 186307 29873 100-432-9164 20150728
Abcd197 PbcDr46725 306185 31908 113-432-4191 20150410
Abcd160 PbcDr77738 330533 61313 105-432-2468 20151007
Abcd327 PbcDr10034 951703 39301 109-432-9235 20150805
Abcd172 PbcDr21679 683299 71686 105-432-5616 20150908
Abcd227 PbcDr57694 876619 46743 109-432-9181 20151101
Abcd900 PbcDr80166 970136 34242 105-432-7415 20150820
Abcd318 PbcDr34711 234066 10989 101-432-9667 20150906
Abcd702 PbcDr86734 997954 97688 105-432-6592 20151026

And below is the way I am trying to solve it.

empdata = LOAD '/home/cloudera/empData.txt' as (empId:chararray, location:chararray, zipCode:long , salary:long, phone:chararray, dateOfJoin:long);
grpdata = GROUP empdata by SUBSTRING(phone, 0, INDEXOF(phone, '-' , 0));
dataCnt = foreach grpdata generate count(grpdata);

But I am not getting error stating that its:- Invalid scalar projection: grpdata : A column needs to be projected from a relation for it to be used as a scalar


And in another problem statement for same data set

Find number of employees having date of joining between 2015-01-01 to 2015-05-28. 

I tried below solution , but this time I am not getting any results.

empdata = LOAD '/home/cloudera/empData.txt' as (empId:chararray, location:chararray, zipCode:long , salary:long, phone:chararray, doj:chararray);
filtDate = filter empdata by ToDate(doj, 'yyyyMMdd') >= ToDate('20150101', 'yyyymmdd') AND ToDate(doj, 'yyyyMMdd') <= ToDate('20150528', 'yyyymmdd');

Please help with explanation.

Upvotes: 0

Views: 385

Answers (2)

Vikas Madhusudana
Vikas Madhusudana

Reputation: 1482

try this

empdata = LOAD '/home/cloudera/empData.txt' as using PigStorage(' ') (empId:chararray, location:chararray, zipCode:long , salary:long, phone:chararray, dateOfJoin:long);
grpdata = GROUP empdata by SUBSTRING(phone, 0, INDEXOF(phone, '-' , 0));
dataCnt = foreach grpdata generate $0, COUNT(empdata);

Upvotes: 1

Vikas Madhusudana
Vikas Madhusudana

Reputation: 1482

you should count empdata

dataCnt = foreach grpdata generate COUNT(empdata);

Upvotes: 0

Related Questions