Manoj
Manoj

Reputation: 5612

PIG extract min from multiple numbers

I have a pig data file

a|1,5,8,4
b|2,6,7,3
c|12,9,6,13

I need to generate

a,1
b,2
c,6

I'm trying

Result =  foreach Data generate 
          (chararray) id  as id,(long) MIN(STRSPLIT(values, ',')) as min_value;

This throws Could not infer the matching function for org.apache.pig.builtin.MIN as multiple or none of them fit. Please use an explicit cast. as numbers are stored as string 1,5,8,4 in the file.

Upvotes: 0

Views: 310

Answers (2)

Gaurav Phapale
Gaurav Phapale

Reputation: 979

Similar to Winnie's answer, but its robust, i.e. it can handle variable size of second column. You can use TransposeTupleToBag UDF from DataFu lib (http://datafu.incubator.apache.org/docs/datafu/1.1.0/datafu/pig/util/TransposeTupleToBag.html)

result1 = FOREACH data GENERATE (chararray) id  as id, STRSPLIT(values, ',') as numbers;
result2 = FOREACH result1 GENERATE id, TransposeTupleToBag(numbers) as numbers;
result3 = FOREACH result2 GENERATE id, MIN(numbers) as min;

Upvotes: 1

reo katoa
reo katoa

Reputation: 5801

It's a bit of a hack, but here are the steps you need to follow. These could all be done in consecutive FOREACHs, or even nested to save space -- there's no reduce phase here.

  1. Split the string into a tuple of values, as you've done: STRSPLIT(values, ',') AS tup
  2. Put the elements of the tuple into a bag: TOBAG(tup.$0, tup.$1, tup.$2, tup.$3) AS bag
  3. Compute the min as usual: MIN(bag) AS min

Note that this requires that the number of values in each string is constant. If this is not the case, you'll need to write a UDF that produces the bag (or even the minimum itself, depending on how general you want to make it).

Upvotes: 0

Related Questions