preitam ojha
preitam ojha

Reputation: 239

Error while calling udf from within withColumn in Spark using Scala

I receive and error while calling udf from within withColumn in Spark using Scala. This error happens while building with SBT.

val hiveRDD = sqlContext.sql("select * from iac_trinity.ctg_us_clickstream")
hiveRDD.persist()

val trnEventDf = hiveRDD
  .withColumn("system_generated_id", getAuthId(hiveRDD("session_user_id")))
  .withColumn("application_assigned_event_id", hiveRDD("event_event_id"))


val getAuthId = udf((session_user_id:String) => {
    if (session_user_id != None){
        if (session_user_id != "NULL"){
            if (session_user_id != "null"){
            session_user_id
          }else "-1"
        }else "-1"
    }else "-1"
  }

)

I receive the error which is -

scala:58: No TypeTag available for String
val getAuthId = udf((session_user_id:String) => {

It compiles properly when instead of (session_user_id:String) I use (session_user_id:Any) but fails in runtime as Any is not recognized in Spark. Please let me know how to handle this.

Upvotes: 0

Views: 527

Answers (1)

Justin Pihony
Justin Pihony

Reputation: 67075

Have you tried being explicit with your types?

udf[String, String]((session_user_id:String)...

Upvotes: 1

Related Questions