Max Song
Max Song

Reputation: 1687

Null Pointer Exception in Spark rdd.RDD.take

Spark doesn't give very informative error messages in one's code, but for future reference, this question is for anyone who gets a Null Pointer Exception that looks like this:

java.lang.NullPointerException
    at org.apache.spark.rdd.RDD.take(RDD.scala:850)
    at org.apache.spark.rdd.RDD.first(RDD.scala:862)
    at modelBuilding$$anonfun$3.apply(modelBuilding.scala:46)
    at modelBuilding$$anonfun$3.apply(modelBuilding.scala:46)
    at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
    at scala.collection.Iterator$$anon$20.hasNext(Iterator.scala:634)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$4.apply(RDD.scala:608)
    at org.apache.spark.rdd.RDD$$anonfun$4.apply(RDD.scala:608)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

Upvotes: 1

Views: 2927

Answers (1)

Max Song
Max Song

Reputation: 1687

Thankfully, there is a variant of the problem discussed here:

http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-NullPointerException-met-when-computing-new-RDD-or-use-count-td2766.html

One way that this comes up is that RDD's cannot be referenced in the closure of a map call.

For example, if my original code is

 val shiftRDD = qtRdd.filter { _.qtExtEvents.qt.date.getTime() != qtRdd.first().qtExtEvents.qt.date.getTime() }

you have to refactor the reference to the RDD out:

val firstVal = qtRdd.first().qtExtEvents.qt.date.getTime()
val shiftOneqtRdd = qtRdd.filter { _.qtExtEvents.qt.date.getTime() != first }

Upvotes: 3

Related Questions