Using Hive Jars with Pyspark

Question

The problem statement is usage of hive jars in py-spark code. We are following the below set of standard steps

Create temporary function in pyspark code - spark.sql (" ")

spark.sql("create temporary function public_upper_case_udf as 'com.hive.udf.PrivateUpperCase' using JAR 'gs://hivebqjarbucket/UpperCase.jar'")

Invoke the temporary function in the spark.sql statements

The issue that we are facing is if the java class in jar file is not declared as public explicitly we are facing with the error during spark.sql invocations of the hive udf

org.apache.spark.sql.AnalysisException: No handler for UDF/UDAF/UDTF 'com.hive.udf.PublicUpperCase'

Java Class Code

class PrivateUpperCase extends UDF {
    public String evaluate(String value) {
        return value.toUpperCase();
  }
}

When I make the class public, the issue seems to get resolved.

The query is if making the class public is only solution or is there any other way around it ?

Any assistance is appreciated.

Note - The Hive Jars cannot be converted to Spark UDFs owing to the complexity.

Using Hive Jars with Pyspark

Answers (1)

Related Questions