y2k-shubham
y2k-shubham

Reputation: 11627

ExceptionInInitializerError in Scala unit test (Scalacheck, Scalatest)

I've written unit tests referring to DataframeGenerator example, which allows you to generate mock dataframes on the fly

After having executed the following commands successfully

sbt clean
sbt update
sbt compile

I get the errors shown in output upon running either of the following commands

sbt assembly
sbt test -- -oF

Output

...
[info] SearchClicksProcessorTest:
17/11/24 14:19:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/24 14:19:07 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
17/11/24 14:19:18 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/11/24 14:19:18 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/11/24 14:19:19 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
[info] - testExplodeMap *** FAILED ***
[info]   ExceptionInInitializerError was thrown during property evaluation.
[info]     Message: "None"
[info]     Occurred when passed generated values (
[info]   
[info]     )
[info] - testFilterByClicks *** FAILED ***
[info]   NoClassDefFoundError was thrown during property evaluation.
[info]     Message: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
[info]     Occurred when passed generated values (
[info]   
[info]     )
[info] - testGetClicksData *** FAILED ***
[info]   NoClassDefFoundError was thrown during property evaluation.
[info]     Message: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
[info]     Occurred when passed generated values (
[info]   
[info]     )
...
[info] *** 3 TESTS FAILED ***
[error] Failed: Total 6, Failed 3, Errors 0, Passed 3
[error] Failed tests:
[error]         com.company.spark.ml.pipelines.search.SearchClicksProcessorTest
[error] (root/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 73 s, completed 24 Nov, 2017 2:19:28 PM

Things that I've tried unsuccessfully

My questions are


EDIT-1 My unit-test class contains several methods like below

class SearchClicksProcessorTest extends FunSuite with Checkers {
  import spark.implicits._

  test("testGetClicksData") {
    val schemaIn = StructType(List(
      StructField("rank", IntegerType),
      StructField("city_id", IntegerType),
      StructField("target", IntegerType)
    ))
    val schemaOut = StructType(List(
      StructField("clicked_res_rank", IntegerType),
      StructField("city_id", IntegerType),
    ))
    val dataFrameGen = DataframeGenerator.arbitraryDataFrame(spark.sqlContext, schemaIn)

    val property = Prop.forAll(dataFrameGen.arbitrary) { dfIn: DataFrame =>
      dfIn.cache()
      val dfOut: DataFrame = dfIn.transform(SearchClicksProcessor.getClicksData)

      dfIn.schema === schemaIn &&
        dfOut.schema === schemaOut &&
        dfIn.filter($"target" === 1).count() === dfOut.count()
    }
    check(property)
  }
}

while build.sbt looks like this

// core settings
organization := "com.company"
scalaVersion := "2.11.11"

name := "repo-name"
version := "0.0.1"

// cache options
offline := false
updateOptions := updateOptions.value.withCachedResolution(true)

// aggregate options
aggregate in assembly := false
aggregate in update := false

// fork options
fork in Test := true

//common libraryDependencies
libraryDependencies ++= Seq(
  scalaTest,
  typesafeConfig,
  ...
  scalajHttp
)

libraryDependencies ++= allAwsDependencies
libraryDependencies ++= SparkDependencies.allSparkDependencies

assemblyMergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
  ...
  case _ => MergeStrategy.first
}

lazy val module-1 = project in file("directory-1")

lazy val module-2 = (project in file("directory-2")).
  dependsOn(module-1).
  aggregate(module-1)

lazy val root = (project in file(".")).
  dependsOn(module-2).
  aggregate(module-2)

Upvotes: 2

Views: 2762

Answers (2)

Hartmut Pfarr
Hartmut Pfarr

Reputation: 6139

i have had a similar problem case, and after investigating I found out, that adding a lazy before a val solved my issue. My estimate is, running a Scala program with Scalatest invokes a little different initializing sequence. Where a normal scala execution initializes vals in an sourecode line numbers top-down order - having nested object {...} blocks initialized in the same way - using the same coding with Scalatest, the execution initializes the valss in nested object { ... } blocks before the vals line-number wise above the object { ... }.

This is absolutely vague I know but deferring initialization with prefixing vals with lazy could solve the test issue here.

The crucial thing here is that it doesn't occur in normal execution, only test execution, and in my case it was only occuring when using lambdas with taps in this form:

...
.tap(x =>
        hook_feld_erweiterungen_hook(
          abc = theProblematicVal
        )
      )
...

Upvotes: 1

y2k-shubham
y2k-shubham

Reputation: 11627

P.S. Please read comments on original question before reading this answer


  • Even the popular solution of overriding SBT's transitive dependency over faster-xml.jackson didn't work for me; in that some more changes were required (ExceptionInInitializerError was gone but some other error cropped up)

  • Finally (in addition to above mentioned fix) I ended up creating DataFrames in a different way (as opposed to StructType used here). I created them as

    spark.sparkContext.parallelize(Seq(MyType)).toDF()

    where MyType is a case class as per the schema of DataFrame

  • While implementing this solution, I encountered a small problem that while datatypes of schema generated from case class were correct, the nullability of fields was often mismatching; fix for this issue was found here


Here I'm blatantly admitting that I'm not sure what was the correct fix: faster-xml.jackson dependency or the alternate way of creating DataFrame, so please feel free to fill the lapses in understanding / investigating the issue

Upvotes: 0

Related Questions