Reputation: 197
I have a data-set stored in HDFS in a file called temp.txt which is as follows :
US,Arizona,51.7
US,California,56.7
US,Bullhead City,51.1
India,Jaisalmer,42.4
Libya,Aziziya,57.8
Iran,Lut Desert,70.7
India,Banda,42.4
Now, I load this into Pig memory through the following command :
temp_input = LOAD '/WC/temp.txt' USING PigStorage(',') as
(country:chararray,city:chararray,temp:double);
After this, I GROUPED all the data in temp_input as :
group_country = GROUP temp_input BY country;
When I dump the data in group_country, following output is displayed on screen:
(US,{(US,Bullhead City,51.1),(US,California,56.7),(US,Arizona,51.7)})
(Iran,{(Iran,Lut Desert,70.7)})
(India,{(India,Banda,42.4),(India,Jaisalmer,42.4)})
(Libya,{(Libya,Aziziya,57.8)})
Once the data-set is grouped, I tried to fetch the country name and indivisual maximum temperatures for each each in group_country through the following query :
max_temp = foreach group_country generate group,max(temp);
This punches out an error which looks like :
017-06-21 13:20:34,708 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
1070: Could not resolve max using imports: [, java.lang.,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /opt/ecosystems/pig-0.16.0/pig_1498026994684.log
What should be my next move to resolve this error and fetch the required result. All help is appreciated.
Upvotes: 0
Views: 76
Reputation: 1766
While transforming relations pig use describe relationname
this will help to know how to iterate. So in your case:
desribe group_country;
Should give you a output like:
group_country: {group: chararray,temp_input: {(country: chararray,city: chararray,temp: double)}}
Then the query :
max_temp = foreach group_country GENERATE group,MAX(temp_input.temp);
Output:
(US,56.7) (Iran,70.7) (India,42.4) (Libya,57.8)
Updated as per comment:
finaldata = foreach group_country {
orderedset = order temp_input by temp DESC;
maxtemps = limit orderedset 1;
generate flatten(maxtemps);
}
Upvotes: 4