Reputation: 913
I am writing a pig program that loads a file that separates its entires with tabs
ex: name TAB year TAB count TAB...
file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);
-- Group by type
grouped = GROUP file BY type;
-- Flatten
by_type = FOREACH grouped GENERATE FLATTEN(group) AS (type, year, match_count, volume_count);
group_operat = FOREACH by_type GENERATE
SUM(match_count) AS sum_m,
SUM(volume_count) AS sum_v,
(float)sum_m/sm_v;
DUMP group_operat;
The issue lies in the group operations object I am trying to create. I'm wanting to sum all the match counts, sum all the volume counts and divide the match counts by volume counts.
What am I doing wrong in my arithmetic operations/object creation? An error I receive is line 7, column 11> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "type:NULL,year:NULL,match_count:NULL,volume_count:NULL", right is "group:chararray"
Thank you.
Upvotes: 4
Views: 6770
Reputation: 4724
Try like this, this will return type and sum.
UPDATED the working code
input.txt
A 2001 10 2
A 2002 20 3
B 2003 30 4
B 2004 40 1
PigScript:
file = LOAD 'input.txt' USING PigStorage() AS (type: chararray, year: chararray,
match_count: float, volume_count: float);
grouped = GROUP file BY type;
group_operat = FOREACH grouped {
sum_m = SUM(file.match_count);
sum_v = SUM(file.volume_count);
GENERATE group,(float)(sum_m/sum_v) as sum_mv;
}
DUMP group_operat;
Output:
(A,6.0)
(B,14.0)
Upvotes: 2
Reputation: 5891
try this,
file = LOAD 'file.csv' USING PigStorage('\t') as (type: chararray, year: chararray,
match_count: float, volume_count: float);
grouped = GROUP file BY (type,year);
group_operat = FOREACH grouped GENERATE group,
SUM(file.match_count) AS sum_m,
SUM(file.volume_count) AS sum_v,
(float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;
Above script give result group by type and year, if you want only group by type then remove from grouped
grouped = GROUP file BY type;
group_operat = FOREACH grouped GENERATE group,file.year,
SUM(file.match_count) AS sum_m,
SUM(file.volume_count) AS sum_v,
(float)(SUM(file.match_count)/SUM(file.volume_count)) as sum_mv;
Upvotes: 1