Reputation: 11627
I've written unit tests referring to DataframeGenerator example, which allows you to generate mock dataframes on the fly
After having executed the following commands successfully
sbt clean
sbt update
sbt compile
I get the errors shown in output upon running either of the following commands
sbt assembly
sbt test -- -oF
Output
...
[info] SearchClicksProcessorTest:
17/11/24 14:19:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/24 14:19:07 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
17/11/24 14:19:18 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/11/24 14:19:18 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/11/24 14:19:19 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
[info] - testExplodeMap *** FAILED ***
[info] ExceptionInInitializerError was thrown during property evaluation.
[info] Message: "None"
[info] Occurred when passed generated values (
[info]
[info] )
[info] - testFilterByClicks *** FAILED ***
[info] NoClassDefFoundError was thrown during property evaluation.
[info] Message: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
[info] Occurred when passed generated values (
[info]
[info] )
[info] - testGetClicksData *** FAILED ***
[info] NoClassDefFoundError was thrown during property evaluation.
[info] Message: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
[info] Occurred when passed generated values (
[info]
[info] )
...
[info] *** 3 TESTS FAILED ***
[error] Failed: Total 6, Failed 3, Errors 0, Passed 3
[error] Failed tests:
[error] com.company.spark.ml.pipelines.search.SearchClicksProcessorTest
[error] (root/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 73 s, completed 24 Nov, 2017 2:19:28 PM
Things that I've tried unsuccessfully
My questions are
EDIT-1 My unit-test class contains several methods like below
class SearchClicksProcessorTest extends FunSuite with Checkers {
import spark.implicits._
test("testGetClicksData") {
val schemaIn = StructType(List(
StructField("rank", IntegerType),
StructField("city_id", IntegerType),
StructField("target", IntegerType)
))
val schemaOut = StructType(List(
StructField("clicked_res_rank", IntegerType),
StructField("city_id", IntegerType),
))
val dataFrameGen = DataframeGenerator.arbitraryDataFrame(spark.sqlContext, schemaIn)
val property = Prop.forAll(dataFrameGen.arbitrary) { dfIn: DataFrame =>
dfIn.cache()
val dfOut: DataFrame = dfIn.transform(SearchClicksProcessor.getClicksData)
dfIn.schema === schemaIn &&
dfOut.schema === schemaOut &&
dfIn.filter($"target" === 1).count() === dfOut.count()
}
check(property)
}
}
while build.sbt
looks like this
// core settings
organization := "com.company"
scalaVersion := "2.11.11"
name := "repo-name"
version := "0.0.1"
// cache options
offline := false
updateOptions := updateOptions.value.withCachedResolution(true)
// aggregate options
aggregate in assembly := false
aggregate in update := false
// fork options
fork in Test := true
//common libraryDependencies
libraryDependencies ++= Seq(
scalaTest,
typesafeConfig,
...
scalajHttp
)
libraryDependencies ++= allAwsDependencies
libraryDependencies ++= SparkDependencies.allSparkDependencies
assemblyMergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
...
case _ => MergeStrategy.first
}
lazy val module-1 = project in file("directory-1")
lazy val module-2 = (project in file("directory-2")).
dependsOn(module-1).
aggregate(module-1)
lazy val root = (project in file(".")).
dependsOn(module-2).
aggregate(module-2)
Upvotes: 2
Views: 2762
Reputation: 6139
i have had a similar problem case, and after investigating I found out, that adding a lazy
before a val
solved my issue. My estimate is, running a Scala program with Scalatest invokes a little different initializing sequence. Where a normal scala execution initializes val
s in an sourecode line numbers top-down order - having nested object {...}
blocks initialized in the same way - using the same coding with Scalatest, the execution initializes the vals
s in nested object { ... }
blocks before the val
s line-number wise above the object { ... }
.
This is absolutely vague I know but deferring initialization with prefixing val
s with lazy
could solve the test issue here.
The crucial thing here is that it doesn't occur in normal execution, only test execution, and in my case it was only occuring when using lambdas with taps in this form:
...
.tap(x =>
hook_feld_erweiterungen_hook(
abc = theProblematicVal
)
)
...
Upvotes: 1
Reputation: 11627
P.S. Please read comments on original question before reading this answer
Even the popular solution of overriding SBT's transitive dependency over faster-xml.jackson
didn't work for me; in that some more changes were required (ExceptionInInitializerError
was gone but some other error cropped up)
Finally (in addition to above mentioned fix) I ended up creating DataFrame
s in a different way (as opposed to StructType
used here). I created them as
spark.sparkContext.parallelize(Seq(MyType)).toDF()
where MyType
is a case class
as per the schema of DataFrame
While implementing this solution, I encountered a small problem that while datatypes of schema generated from case class
were correct, the nullability
of fields was often mismatching; fix for this issue was found here
Here I'm blatantly admitting that I'm not sure what was the correct fix: faster-xml.jackson
dependency or the alternate way of creating DataFrame
, so please feel free to fill the lapses in understanding / investigating the issue
Upvotes: 0