MonkeyBonkey
MonkeyBonkey

Reputation: 47921

trouble creating a pig udf schema

Trying to parse xml and I'm having trouble with my UDF returning a tuple. Following the example from http://verboselogging.com/2010/03/31/writing-user-defined-functions-for-pig

pig script

titles = FOREACH programs GENERATE (px.pig.udf.PARSE_KEYWORDS(program))
    AS (root_id:chararray, keyword:chararray);

here is the output schema code:

 override def outputSchema(input: Schema): Schema = {
    try {
      val s: Schema = new Schema
      s.add(new Schema.FieldSchema("root_id", DataType.CHARARRAY))
      s.add(new Schema.FieldSchema("keyword", DataType.CHARARRAY))
      return s
    }
    catch {
      case e: Exception => {
        return null
      }
    }
  }

I'm getting this error

pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: 
ERROR 0: Given UDF returns an improper Schema. 
Schema should only contain one field of a Tuple, Bag, or a single type. 
Returns: {root_id: chararray,keyword: chararray}

Update Final Solution:

In java

public Schema outputSchema(Schema input) {
    try {
        Schema tupleSchema = new Schema();
        tupleSchema.add(input.getField(1));
        tupleSchema.add(input.getField(0));
        return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),  input),tupleSchema, DataType.TUPLE));
    } catch (Exception e) {
        return null;
    }
}

Upvotes: 4

Views: 3044

Answers (1)

Matthew Moisen
Matthew Moisen

Reputation: 18329

You will need to add your s schema instance variable to another Schema object.

Try returning a new Schema(new FieldSchema(..., input), s, DataType.TUPLE)); like in the template below:

Here is my answer in Java (fill out your variable names):

@Override
    public Schema outputSchema(Schema input) {
        Schema tupleSchema = new Schema();
        try {

            tupleSchema.add(new FieldSchema("root_id", DataType.CHARARRAY));
            tupleSchema.add(new FieldSchema("keyword", DataType.CHARARRAY));

            return new Schema(new FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), tupleSchema, DataType.TUPLE));
        } catch (FrontendException e) {
            e.printStackTrace();
            return null;
        }
    }

Would you try:

titles = FOREACH programs GENERATE (px.pig.udf.PARSE_KEYWORDS(program));

If that doesn't error, then try:

titles = FOREACH TITLES GENERATE
    $0 AS root_id
    ,$1 AS keyword
;

And tell me the error?

Upvotes: 5

Related Questions