Felipe
Felipe

Reputation: 7583

Guava dependency error for Spark + Play framework using Scala

I have a Play web app using Scala 2.11.8 and Spark "spark-core" % "2.2.0" and "spark-sql" % "2.2.0". I am trying to read a file that contains ratings of movies and do some transformation on it. When I use the function to split the tabs (movieLines.map(x => (x.split("\t")(1).toInt, 1))) I get an error that I guess it is because the guava lib dependency. I guess it is this because all the searches that I do on google it shows some fix based on this. But I cannot figure out how I exclude some guava dependencies.

Here is my code:

def popularMovies() = Action { implicit request: Request[AnyContent] =>
    Util.downloadSourceFile("downloads/ml-100k.zip", "http://files.grouplens.org/datasets/movielens/ml-100k.zip")
    Util.unzip("downloads/ml-100k.zip")

    val sparkContext = SparkCommons.sparkSession.sparkContext
    println("got sparkContext")

    val movieLines = sparkContext.textFile("downloads/ml-100k/u.data")
    println("popularMovies")
    println(movieLines)

    // Map to (movieID , 1) tuples
    val movieTuples = movieLines.map(x => (x.split("\t")(1).toInt, 1))
    println("movieTuples")
    println(movieTuples)

    // Count up all the 1's for each movie
    val movieCounts = movieTuples.reduceByKey((x, y) => x + y)
    println("movieCounts")
    println(movieCounts)

    // Flip (movieId, count) to (count, movieId)
    val movieCountFlipped = movieCounts.map(x => (x._2, x._1))
    println(movieCountFlipped)

    // Sort
    val sortedMovies = movieCountFlipped.sortByKey()
    println(sortedMovies)

    // collect and print the result
    val results = sortedMovies.collect().toList.mkString(",\n")
    println(results)

    Ok("[" + results + "]")
  }

and the error:

[error] application - 

! @76oh9h40m - Internal server error, for (GET) [/api/popularMovies] ->

play.api.http.HttpErrorHandlerExceptions$$anon$1: Execution exception[[RuntimeException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat]]
    at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:255)
    at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:180)
    at play.core.server.AkkaHttpServer$$anonfun$3.applyOrElse(AkkaHttpServer.scala:311)
    at play.core.server.AkkaHttpServer$$anonfun$3.applyOrElse(AkkaHttpServer.scala:309)
    at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:346)
    at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:345)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
Caused by: java.lang.RuntimeException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
    at play.api.mvc.ActionBuilder$$anon$2.apply(Action.scala:424)
    at play.api.mvc.Action$$anonfun$apply$2.apply(Action.scala:96)
    at play.api.mvc.Action$$anonfun$apply$2.apply(Action.scala:89)
    at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2$$anonfun$1.apply(Accumulator.scala:174)
    at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2$$anonfun$1.apply(Accumulator.scala:174)
    at scala.util.Try$.apply(Try.scala:192)
    at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2.apply(Accumulator.scala:174)
    at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2.apply(Accumulator.scala:170)
    at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:52)
    at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:52)
Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)

Upvotes: 1

Views: 885

Answers (1)

Felipe
Felipe

Reputation: 7583

I added this dependency and it fixed my issue.

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.7.2"

Upvotes: 5

Related Questions