Capacytron
Capacytron

Reputation: 3739

Can't invoke Java UDF which accepts Tuple input

I can't understand the way to invoke Java UDF which accepts Tuple as input.

gsmCell = LOAD '$gsmCell' using PigStorage('\t') as
          (branchId,
           cellId: int,
           lac: int,
           lon: double,
           lat: double
          );

gsmCellFiltered = FILTER gsmCell BY     cellId     is not null and
                                        lac        is not null and
                                        lon        is not null and
                                        lat        is not null;

gsmCellFixed = FOREACH gsmCellFiltered GENERATE FLATTEN (pig.parser.GSMCellParser(* ) )  as
                                                (cellId: int,
                                                 lac: int,
                                                 lon: double,
                                                 lat: double,
                                                );

When I wrap input for GSMCellParser using () I get inside UDF: Tuple(Tuple). Pig does wraps all fields into tuple and puts it inside one more tuple.

When I try to pass a list of fields, use * or $0.. I do get exception:

sed by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045: 
<line 28, column 57> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.
    at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:761)
    at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:88)
    at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191)
    at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157)
    at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:246)

What Do i do wrong? My aim is to feed my UDF with tuple. Tuple should contain a list of fields. (i.e. size of tuple should be 4: cellid, lac, lon. lat)

UPD: I've tried GROUP ALL:

--filter non valid records
gsmCellFiltered = FILTER gsmCell BY     cellId     is not null and
                                        lac        is not null and
                                        lon        is not null and
                                        lat        is not null and
                                        azimuth    is not null and
                                        angWidth   is not null;

gsmCellFilteredGrouped = GROUP gsmCellFiltered ALL;

--fix records
gsmCellFixed = FOREACH gsmCellFilteredGrouped GENERATE FLATTEN                  (pig.parser.GSMCellParser($1))  as
                                                        (cellId: int,
                                                         lac: int,
                                                         lon: double,
                                                         lat: double,
                                                         azimuth: double,
                                                         ppw,
                                                         midDist: double,
                                                         maxDist,
                                                         cellType: chararray,
                                                         angWidth: double,
                                                         gen: chararray,
                                                         startAngle: double
                                                        );



Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045: 
<line 27, column 64> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.

The input schema for this UDF is: Tuple I do't get the idea. Tuple is an ordered set of fileds. LOAD function returns a tuple to me. I want to pass the whole tuple to my UDF.

Upvotes: 1

Views: 1339

Answers (1)

Chris White
Chris White

Reputation: 30089

From the signature of the T EvalFunc<T>.eval(Tuple) method, you can see that all EvalFunc UDFs are passed a Tuple - this tuple contains all the arguments passed to the UDF.

In your case, calling GSMCellParser(*) means that the first argument of the Tuple will be the current tuple being processed (hence the tuple in a tuple).

Conceptually if you want the tuple to just contain the fields you should invoke as GSMCellParser(cellid, lac, lat, lon), then the Tuple passed to the eval func would have a schema of (int, int, double, double). This also makes your Tuple coding easier as you don't have to fish out the fields from the passed 'tuple in a tuple', rather you know that field 0 is the cellid, field 1 id the lac, etc.

Upvotes: 1

Related Questions