monster
monster

Reputation: 1782

Apache Spark Task Serialization

This method gives me a Task Serialization Error:

def singleItemPrediction(userid : Int, item : Int) = {

  val userAndItems = useritemrating.filter({x => x._1 == userid && x._2 != item})

  val userMean = userAndItems.aggregate(0.0)((accum, rating) => accum + rating._3, _+_) / userAndItems.count()

  userMean + userAndItems.aggregate(0.0)((accum, ui) => accum + avgDev(userid, item, ui._2), _+_) / userAndItems.count()

}

Changing the bottom line (what's returned) to:

avgDev(userid, item1, item2), _+_) / userAndItems.count()

Does not give an error!

I do not understand why, what is the difference? It has something to do with the avgDev method being called with the aggregate action, but I'm not sure why this occurs. I keep having these problems and keep solving them, I'd like to know why they occur so I can stop making the same mistake and go through this scenario of fixing code over and over again.

Upvotes: 0

Views: 165

Answers (1)

Daniel Darabos
Daniel Darabos

Reputation: 27455

You can enable serialization debugging with:

-Dsun.io.serialization.extendedDebugInfo=true

Upvotes: 1

Related Questions