Apache Spark: how to call UDF over dataset in Java?

Question

What is exact translation of below Scala code snippet in Java?

import org.apache.spark.sql.functions.udf 

def upper(s:String) : String = {
    s.toUpperCase
}
val toUpper = udf(upper _)
peopleDS.select(peopleDS(“name”), toUpper(peopledS(“name”))).show

Please fill below missing statement in Java:

import org.apache.spark.sql.api.java.UDF1;

UDF1 toUpper = new UDF1() {
    public String call(final String str) throws Exception {
        return str.toUpperCase();
    }
};

peopleDS.select(peopleDS.col("name"), /* how to run  toUpper("name")) ? */.show();

NOTE

Register UDF, then call using selectExpr works for me, but I need something similar to the showen above.

Working example:

sqlContext.udf().register(
    "toUpper",
    (String s) -> s.toUpperCase(),
    DataTypes.StringType
);
peopleDF.selectExpr("toUpper(name)","name").show();

abaghel · Accepted Answer

In Java calling UDF without registration is not possible. Please check the following discussion:

Using UDFs in Java without registration

Below is your UDF:

private static UDF1 toUpper = new UDF1() {
    public String call(final String str) throws Exception {
        return str.toUpperCase();
    }
};

Register the UDF and you can use callUDF function.

import static org.apache.spark.sql.functions.callUDF;
import static org.apache.spark.sql.functions.col;

sqlContext.udf().register("toUpper", toUpper, DataTypes.StringType);
peopleDF.select(col("name"),callUDF("toUpper", col("name"))).show();

Apache Spark: how to call UDF over dataset in Java?

NOTE

Answers (2)

Related Questions