user2250400
user2250400

Reputation: 51

Combining/separating Pig UDF returns

Suppose a Pig UDF creates two different types of data records.

How can a Pig script process the returned list of combined Tuples from this UDF in two separate ways?

For example:

public Tuple exec (Tuple input)  // input ignored in UDF for simplicity
   {
   Tuple t = TupleFactory.getInstance ().newTuple ();
   if (Math.random () < 0.5)
      t.append ("less than half");
   else
      t.append (new Date ());
   return t;
   }

The Pig script should do something like:

register ...
define myUDF ...
data = load ...;
combinedList = foreach data generate myUDF (data);

stringList = filter combinedList by $0 instanceof java.lang.String; // ??
dateList = filter combinedLists by $0 instanceof java.util.Date; //??

store stringList into ... ;
store dateList into ... ;

Thank you,

Upvotes: 1

Views: 230

Answers (1)

TC1
TC1

Reputation: 1

There are two issues here.

  1. Under no circumstances should you ever return different data types from your UDF. This is against the principle of least surprise and a couple of other things. If you want to indicate an invalid value, returning null or some invalid constant would be much more appropriate.
  2. What you're trying to do is not done with multiple filters, there is the SPLIT operation for that. Although your example of using instanceof within Pig is wrong, the basic usage would be like SPLIT combinedList INTO stringList IF $0 instanceof String, dateList IF $0 instanceof Date.

Upvotes: 0

Related Questions