fmv1992
fmv1992

Reputation: 322

Apache-Spark UDF defined inside object raises "No TypeTag available for String"

I get different behaviors for copy pasting a function during an interactive session versus compiled with sbt.

Minimal, Complete, and Verifiable example for the interactive session:

$ sbt package 
[error] src/main/scala/xxyy.scala:6: No TypeTag available for String
[error]     val correctDiacritics = udf((s: scala.Predef.String) => {
[error]                                ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 9 s, completed May 22, 2018 2:22:52 PM
$ cat src/main/scala/xxyy.scala 
package xxx.yyy
import org.apache.spark.sql.functions.udf
object DummyObject {
    val correctDiacritics = udf((s: scala.Predef.String) => {
            s.replaceAll("è","e")
            .replaceAll("é","e")
            .replaceAll("à","a")
            .replaceAll("ç","c")
            })
}

The aforementioned code does not compile. However during an interactive session:

// During the `spark-shell` session.
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.functions.udf
object DummyObject {
val correctDiacritics = udf((s: scala.Predef.String) => {
    s.replaceAll("è","e")
    .replaceAll("é","e")
    .replaceAll("à","a")
    .replaceAll("ç","c")
})
}
// Exiting paste mode, now interpreting.
// import org.apache.spark.sql.functions.udf
// defined object DummyObject
// Proceeds sucessfully.

Versions:


Related questions:

Upvotes: 1

Views: 710

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35249

Your build definition is incorrect:

  • You build your project with Scala 2.11.12
  • But use Spark dependencies build with Scala 2.10

As Scala is not binary compatible between major version, you get an error.

Instead embedding Scala version it is better to use %%:

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-sql" % "2.1.0",
  "org.apache.spark" %% "spark-core" % "2.1.0",
  "org.apache.spark" %% "spark-mllib" % "2.1.0"
)

otherwise make sure you use the right build:

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-sql_2.11" % "2.1.0",
  "org.apache.spark" % "spark-core_2.11" % "2.1.0",
  "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
)

Upvotes: 2

Related Questions