Reputation: 11
import org.apache.spark.sql.functions._
case class oneClass(a : Int , b: String , c :string)
val doSomthing = udf ((t1 : Seq[String], str : String , values : t2 Seq[String])
=> {
val pos = t1.indexOf(str)
if (pos >= 0) oneClass(pos, str,t2(pos))
//if no control of pos possible return -1 ===> indexoutofboundsexception
//if control the udf return Any then when I use it ===> Exception
}
)
How I can return case class only when pos >= 0 and all the time return case class ??
Upvotes: 1
Views: 1420
Reputation: 27373
Either throw an exception if this should not happen (spark job will fail):
val doSomthing = udf ((t1 : Seq[String], str : String , t2 :Seq[String])=> {
val pos = t1.indexOf(str)
if (pos >= 0) oneClass(pos, str,t2(pos)) else {
throw new IllegalArgumentException
}
})
otherwise use Option
:
val doSomthing = udf ((t1 : Seq[String], str : String , t2 :Seq[String])=> {
val pos = t1.indexOf(str)
if (pos >= 0) Some(oneClass(pos, str,t2(pos))) else None
})
In the latter case, your result will be null
in your DataFrame (None
translates to null
)
A pattern which can also be used it only to return a result if no exception is thrown :
val doSomthing = udf ((t1 : Seq[String], str : String , t2 :Seq[String])=> {
scala.util.Try{
val pos = t1.indexOf(str)
oneClass(pos, str,t2(pos))
}.toOption
})
This can be useful for testing, but I don't consider this good practice
Upvotes: 1