K.T.A
K.T.A

Reputation: 11

udf spark Scala return case class

import org.apache.spark.sql.functions._

case class oneClass(a : Int , b: String , c :string)

val doSomthing = udf ((t1 : Seq[String], str : String , values : t2 Seq[String])
    => {
    val pos = t1.indexOf(str)
    if (pos >= 0) oneClass(pos, str,t2(pos))
    //if no control of pos possible return -1 ===> indexoutofboundsexception
    //if control the udf return Any then when I use it ===> Exception
    }
)

How I can return case class only when pos >= 0 and all the time return case class ??

Upvotes: 1

Views: 1420

Answers (1)

Raphael Roth
Raphael Roth

Reputation: 27373

Either throw an exception if this should not happen (spark job will fail):

val doSomthing = udf ((t1 : Seq[String], str : String , t2 :Seq[String])=> {
  val pos = t1.indexOf(str)
  if (pos >= 0) oneClass(pos, str,t2(pos)) else {
    throw new IllegalArgumentException
  }
})

otherwise use Option:

val doSomthing = udf ((t1 : Seq[String], str : String , t2 :Seq[String])=> {
  val pos = t1.indexOf(str)
  if (pos >= 0) Some(oneClass(pos, str,t2(pos))) else None
})

In the latter case, your result will be null in your DataFrame (None translates to null)

A pattern which can also be used it only to return a result if no exception is thrown :

val doSomthing = udf ((t1 : Seq[String], str : String , t2 :Seq[String])=> {
  scala.util.Try{
  val pos = t1.indexOf(str)
  oneClass(pos, str,t2(pos))
  }.toOption
})

This can be useful for testing, but I don't consider this good practice

Upvotes: 1

Related Questions