user3335406
user3335406

Reputation:

generating maximum number using pig

I was trying to find generate year , MAX(number) for the following data and it gives me error saying

ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.MAX as multiple or none of them fit. Please use an explicit cast.

i used the commands

loadfirstoutput = load '/outt/part-r-00000' as (year:chararray, number:chararray); 
foreach2 = foreach loadfirstoutput generate year, MAX(number);  
dump foreach2;

 ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.MAX as multiple or none of them fit. Please use an explicit cast.

"   8
A"  6
"0" 4004
Ng" 1
1)" 1
Co" 5
/i>"    12
#4)"    1
&amp    2
21)"    1
22)"    2
38)"    1
80)"    1
Now"    1
Son"    1
"Unk"   1
Budd"   1
Food"   1
Ginn"   1
Hate"   1
Jax)"   1
Lang"   1
More"   1
Ross"   1
Sans"   1
Sign"   2
Sons"   1
Stan"   1
"1378"  1
"1806"  1
"1900"  2
"1901"  5
"1902"  2
"1904"  1
"1906"  1
"1908"  1
"1909"  2
"1910"  1
"1911"  14
"1914"  1
"1917"  1
"1920"  29
"1921"  2
"1923"  10
"1924"  2

Upvotes: 1

Views: 2373

Answers (2)

arun
arun

Reputation: 11013

This does not answer the question, but the same error happened in another situation when different types were mixed in MAX like this:

FOREACH fltrd GENERATE ids, MAX(TOBAG(suu, 1)) AS uu;

where suu is a long field.

I had to cast the int 1 to long 1 by appending an L to 1 like this:

FOREACH fltrd GENERATE ids, MAX(TOBAG(suu, 1L)) AS uu;

Upvotes: 0

robthewolf
robthewolf

Reputation: 7624

Its a bit hard to tell whats going on with your data. But assuming it is as the pattern suggests you need to group first.

loadfirstoutput = load '/outt/part-r-00000' as (name:chararray, year:chararray, number:chararray); 
A = GROUP  loadfirstoutput ALL;
B = FOREACH A GENERATE MAX(loadfirstoutput.number);  
dump B;

This will give you the max "number"

If you want the max number per year

loadfirstoutput = load '/outt/part-r-00000' as (name:chararray, year:chararray, number:chararray); 
    A = GROUP  loadfirstoutput BY year;
    B = FOREACH A GENERATE MAX(loadfirstoutput.number);  
    dump B;

Upvotes: 3

Related Questions