Reputation: 51
Suppose a Pig UDF creates two different types of data records.
How can a Pig script process the returned list of combined Tuples from this UDF in two separate ways?
For example:
public Tuple exec (Tuple input) // input ignored in UDF for simplicity
{
Tuple t = TupleFactory.getInstance ().newTuple ();
if (Math.random () < 0.5)
t.append ("less than half");
else
t.append (new Date ());
return t;
}
The Pig script should do something like:
register ...
define myUDF ...
data = load ...;
combinedList = foreach data generate myUDF (data);
stringList = filter combinedList by $0 instanceof java.lang.String; // ??
dateList = filter combinedLists by $0 instanceof java.util.Date; //??
store stringList into ... ;
store dateList into ... ;
Thank you,
Upvotes: 1
Views: 230
Reputation: 1
There are two issues here.
null
or some invalid constant would be much more appropriate.SPLIT
operation for that. Although your example of using instanceof
within Pig is wrong, the basic usage would be like SPLIT combinedList INTO stringList IF $0 instanceof String, dateList IF $0 instanceof Date
.Upvotes: 0