Kelly
Kelly

Reputation: 1005

Run UDF in Apache Pig

I am at my wit's end trying to call Java functions from Grunt using Pig. I am fairly new to Hadoop, and I haven't used Linux or Java in several years (I'm a .Net girl). I have gotten functions from PiggyBank.jar to work, which was provided. I wrote a simple test class in Eclipse and exported the jar file to my root folder for Grunt. I run these commands in the following order and get the error below.

grunt> Register KellyProject1.jar
grunt> grades = load 'grades.txt' as (studentName:charArray, <etc> );
grunt> grades2 = foreach grades generate studentName, hadoop.Upper(studentName);

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve hadoop.Upper using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

Upper.java:

package hadoop;

import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class Upper extends EvalFunc<String>
{
    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;
        try{
            //String str = (String)input.get(0);
            String str = "something";
            return str.toUpperCase();
        }catch(Exception e){
            throw new IOException("Caught exception processing input row ", e);
        }
    }
}  

At first I wrote a simple "hello world" static method in a regular class and then I saw all these examples online that extended EvalFunc so I copied that java code down. Also wondering if maybe I just don't understand how to properly generate/export jar files? Eclipse is set to build automatically, so I just right click on the project name and go to Export and have it put the jar file in my Cloudera folder. It's hard to get my brain to turn away from Visual Studio and dlls.

Upvotes: 0

Views: 953

Answers (2)

nik_7
nik_7

Reputation: 137

You don't need to 'hadoop' word in pig ever it usually throws error on this word. certainly registering jar is right step but one more important step is to tell the pig the fully qualiied name for your class which you are using as UDF(Java functions in your case). There are two cases

Case1: This is the case when you define your own UDF.

grunt> Register KellyProject1.jar
grunt> DEFINE YourUdfName FullyQualifiedname;
grunt> grades = load 'grades.txt' as (studentName:charArray, <etc> );
grunt> grades2 = foreach grades generate studentName, YourUdfName(studentName);

Just make sure Jar you are registering must contain the class you are using in FullyQualified name

Case2: This is the case when you try to use already existing PIG UDFs in hadoop package.

grunt> grades2 = foreach grades generate studentName,UPPER(studentName);

2nd case is Your case where you are trying to use inbuilt pig functions provided in hadoop package.

Hence for inbuilt PIG functions we do not need fully qualified name to be used.

Upvotes: 1

karthik
karthik

Reputation: 551

while running in windows platform. just open the cmd prompt in the ADMIN mode.!! Make sure that your username doesn't contain any whitespaces..

Upvotes: 0

Related Questions