Reputation: 322
I get different behaviors for copy pasting a function during an interactive session versus compiled with sbt.
Minimal, Complete, and Verifiable example for the interactive session:
$ sbt package
[error] src/main/scala/xxyy.scala:6: No TypeTag available for String
[error] val correctDiacritics = udf((s: scala.Predef.String) => {
[error] ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 9 s, completed May 22, 2018 2:22:52 PM
$ cat src/main/scala/xxyy.scala
package xxx.yyy
import org.apache.spark.sql.functions.udf
object DummyObject {
val correctDiacritics = udf((s: scala.Predef.String) => {
s.replaceAll("è","e")
.replaceAll("é","e")
.replaceAll("à","a")
.replaceAll("ç","c")
})
}
The aforementioned code does not compile. However during an interactive session:
// During the `spark-shell` session.
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.functions.udf
object DummyObject {
val correctDiacritics = udf((s: scala.Predef.String) => {
s.replaceAll("è","e")
.replaceAll("é","e")
.replaceAll("à","a")
.replaceAll("ç","c")
})
}
// Exiting paste mode, now interpreting.
// import org.apache.spark.sql.functions.udf
// defined object DummyObject
// Proceeds sucessfully.
Versions:
I'm using Scala 2.11
.
I'm using Spark 2.1.0
.
built.sbt
:
name := "my_app"
version := "0.0.1"
scalaVersion := "2.11.12"
resolvers ++= Seq(
Resolver sonatypeRepo "public",
Resolver typesafeRepo "releases"
)
resolvers += "MavenRepository" at "https://mvnrepository.com/"
libraryDependencies ++= Seq(
// "org.apache.spark" %% "spark-core" % "2.1.0",
// "org.apache.spark" %% "spark-sql" % "2.1.0",
//"org.apache.spark" %% "spark-core_2.10" % "1.0.2",
// "org.apache.spark" %
"org.apache.spark" % "spark-sql_2.10" % "2.1.0",
"org.apache.spark" % "spark-core_2.10" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.10" % "2.1.0"
)
Related questions:
udf No TypeTag available for type string.
Spark, sbt package -- No TypeTag available.
No typeTag available Error in scala spark udf.
Spark UDF error no TypeTag available for string.
Upvotes: 1
Views: 710
Reputation: 35249
Your build definition is incorrect:
As Scala is not binary compatible between major version, you get an error.
Instead embedding Scala version it is better to use %%
:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "2.1.0",
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-mllib" % "2.1.0"
)
otherwise make sure you use the right build:
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-sql_2.11" % "2.1.0",
"org.apache.spark" % "spark-core_2.11" % "2.1.0",
"org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
)
Upvotes: 2