Metadata
Metadata

Reputation: 2083

Bug in my Pig Latin script

Im trying to do a Median operation on a file in Pig. The file looks like this.

NewYork,-1
NewYork,-5
NewYork,-2
NewYork,3
NewYork,4
NewYork,13
NewYork,11
Amsterdam,12
Amsterdam,11
Amsterdam,2
Amsterdam,1
Amsterdam,-1
Amsterdam,-4
Mumbai,1
Mumbai,4
Mumbai,5
Mumbai,-2
Mumbai,9
Mumbai,-4

The file is loaded and the data inside it is grouped as follows:

 wdata = load 'weatherdata' using PigStorage(',') as (city:chararray, temp:int);
 wdata_g = group wdata by city;

Im trying to get the median from all the temperatures of the cities as following:

wdata_tempmedian = foreach wdata_g { tu = wdata.temp as temp; ord = order tu by temp generate group, Median(ord); }

The data is ordering because is needs to in sorted order to find a median. But Im getting the following error message which I couldn't figure out what is the mistake:

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 3, column 53> mismatched input 'as' expecting SEMI_COLON

Any help is much appreciated.

Upvotes: 0

Views: 52

Answers (1)

nobody
nobody

Reputation: 11080

You are missing a ';' after ordering the temperatures.

wdata_tempmedian = FOREACH wdata_g { 
                     tu = wdata.temp as temp; 
                     ord = ORDER tu BY temp;
                     GENERATE group, Median(ord);
                      }

OR

wdata_ordered = ORDER wdata_g BY temp;
wdata_tempmedian = FOREACH wdata_ordered GENERATE group, Median(ord);

Note:I am assuming you are using data-fu since PIG does not have a Median function.Ensure the jar is correctly registered

register /path/datafu-pig-incubating-1.3.1.jar 

Upvotes: 1

Related Questions