Reputation: 5612
I have a pig data file
a|1,5,8,4
b|2,6,7,3
c|12,9,6,13
I need to generate
a,1
b,2
c,6
I'm trying
Result = foreach Data generate
(chararray) id as id,(long) MIN(STRSPLIT(values, ',')) as min_value;
This throws Could not infer the matching function for org.apache.pig.builtin.MIN as multiple or none of them fit. Please use an explicit cast.
as numbers are stored as string 1,5,8,4
in the file.
Upvotes: 0
Views: 310
Reputation: 979
Similar to Winnie's answer, but its robust, i.e. it can handle variable size of second column. You can use TransposeTupleToBag UDF from DataFu lib (http://datafu.incubator.apache.org/docs/datafu/1.1.0/datafu/pig/util/TransposeTupleToBag.html)
result1 = FOREACH data GENERATE (chararray) id as id, STRSPLIT(values, ',') as numbers;
result2 = FOREACH result1 GENERATE id, TransposeTupleToBag(numbers) as numbers;
result3 = FOREACH result2 GENERATE id, MIN(numbers) as min;
Upvotes: 1
Reputation: 5801
It's a bit of a hack, but here are the steps you need to follow. These could all be done in consecutive FOREACH
s, or even nested to save space -- there's no reduce phase here.
STRSPLIT(values, ',') AS tup
TOBAG(tup.$0, tup.$1, tup.$2, tup.$3) AS bag
MIN(bag) AS min
Note that this requires that the number of values in each string is constant. If this is not the case, you'll need to write a UDF that produces the bag (or even the minimum itself, depending on how general you want to make it).
Upvotes: 0