Reputation: 145
I'm taking an on-line course. My current assignment asks me to compute an average for one of the fields. I got this working when I was using a single relation, but the current task involves computing a value from a relation created by joining two others. When I try a function based on the previously successful approach, I get this error message which has me confused.
Invalid field projection. Projected field [join2::Y_rate2::wtd_stars] does not exist in schema:
The code I have entered in the PIG shell is:
avg = FOREACH groupedJoin2 GENERATE AVG(join2::Y_rate2::wtd_stars);
When I enter
grunt> describe groupedJoin2
this is my output:
groupedJoin2: {group: chararray,join2: {(Y_rate2::business_id: chararray,Y_rate2::stars: int,Y_rate2::useful_clipped: int,Y_rate2::wtd_stars: double,Y_m2::business_idgroup: chararray,Y_m2::num_ratings: long,Y_m2::avg_stars: double,Y_m2::avg_useful: double,Y_m2::avg_wtdstars: double)}}
I believe that my problem is that I don't know how to reference the field I want to compute an average of, but hours of searching over several days has not enlightened me.
Can anyone point out how to reference the field, if that is my problem? If that isn't my problem, I'll be grateful for your pointing me in the right direction.
Upvotes: 0
Views: 597
Reputation: 410
I think you want to say: AVG(join2.Y_rate2::wtd_stars)
Dot is used to dereference a bag or tuple (in this case, the join2 bag). Colons are used after a join or group to disambiguate field names.
Bag dereferencing can be done by name (bag.field_name) or position (bag.$0). If a set of fields are dereferenced (bag.(name1, name2) or bag.($0, $1)), the expression represents a bag composed of the specified fields. http://pig.apache.org/docs/r0.15.0/basic.html#deref
Upvotes: 1