Creting UDF function with NonPrimitive Data Type and using in Spark-sql Query: Scala

Question

I am creating one function in scala which i want to use in my spark-sql query.my query is working fine in hive or if i am giving the same query in spark sql but the same query i'm using at multiple places so i want to create it as reusable function/method so whenever its required i can just call it. I have created below function in my scala class.

def date_part(date_column:Column) = {
    val m1: Column = month(to_date(from_unixtime(unix_timestamp(date_column, "dd-MM-yyyy")))) //give  value as 01,02...etc

    m1 match {
        case 01 => concat(concat(year(to_date(from_unixtime(unix_timestamp(date_column, "dd-MM- yyyy"))))-1,'-'),substr(year(to_date(from_unixtime(unix_timestamp(date_column, "dd-MM-yyyy")))),3,4))
        //etc..
        case _ => "some other logic"
    }
}

but its showing multiple error.

For 01:

◾Decimal integer literals may not have a leading zero. (Octal syntax is obsolete.)

◾type mismatch; found : Int(0) required: org.apache.spark.sql.Column.

For '-':

type mismatch; found : Char('-') required: org.apache.spark.sql.Column.

For 'substr':

not found: value substr.

also that if I'm creating any simple function also with type as column I'm not able to register it as I'm getting error not possible in columnar format.and for all primitive data types(String,Long,Int) its working fine.But in my case type is column so I'm not able to do this.Can someone please guide me how should i do this.as of now I found on stack-overflow that i need use this function with df and then need to convert this df as temp table.can someone please guide me any other alternate way so without much changes in my existing code i can use this functionality.

s.polam · Accepted Answer

Try Below Code.

scala> import org.joda.time.format._
import org.joda.time.format._

scala> spark.udf.register("datePart",(date:String) => DateTimeFormat.forPattern("MM-dd-yyyy").parseDateTime(date).toString(DateTimeFormat.forPattern("MMyyyy")))
res102: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(,StringType,Some(List(StringType)))

scala> spark.sql("""select datePart("03-01-2019") as datepart""").show
+--------+
|datepart|
+--------+
|  032019|
+--------+

Creting UDF function with NonPrimitive Data Type and using in Spark-sql Query: Scala

Answers (2)

Related Questions