Reputation: 7583
I have a Play web app using Scala 2.11.8 and Spark "spark-core" % "2.2.0" and "spark-sql" % "2.2.0". I am trying to read a file that contains ratings of movies and do some transformation on it. When I use the function to split the tabs (movieLines.map(x => (x.split("\t")(1).toInt, 1))
) I get an error that I guess it is because the guava lib dependency. I guess it is this because all the searches that I do on google it shows some fix based on this. But I cannot figure out how I exclude some guava dependencies.
Here is my code:
def popularMovies() = Action { implicit request: Request[AnyContent] =>
Util.downloadSourceFile("downloads/ml-100k.zip", "http://files.grouplens.org/datasets/movielens/ml-100k.zip")
Util.unzip("downloads/ml-100k.zip")
val sparkContext = SparkCommons.sparkSession.sparkContext
println("got sparkContext")
val movieLines = sparkContext.textFile("downloads/ml-100k/u.data")
println("popularMovies")
println(movieLines)
// Map to (movieID , 1) tuples
val movieTuples = movieLines.map(x => (x.split("\t")(1).toInt, 1))
println("movieTuples")
println(movieTuples)
// Count up all the 1's for each movie
val movieCounts = movieTuples.reduceByKey((x, y) => x + y)
println("movieCounts")
println(movieCounts)
// Flip (movieId, count) to (count, movieId)
val movieCountFlipped = movieCounts.map(x => (x._2, x._1))
println(movieCountFlipped)
// Sort
val sortedMovies = movieCountFlipped.sortByKey()
println(sortedMovies)
// collect and print the result
val results = sortedMovies.collect().toList.mkString(",\n")
println(results)
Ok("[" + results + "]")
}
and the error:
[error] application -
! @76oh9h40m - Internal server error, for (GET) [/api/popularMovies] ->
play.api.http.HttpErrorHandlerExceptions$$anon$1: Execution exception[[RuntimeException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat]]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:255)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:180)
at play.core.server.AkkaHttpServer$$anonfun$3.applyOrElse(AkkaHttpServer.scala:311)
at play.core.server.AkkaHttpServer$$anonfun$3.applyOrElse(AkkaHttpServer.scala:309)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:346)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:345)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
Caused by: java.lang.RuntimeException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
at play.api.mvc.ActionBuilder$$anon$2.apply(Action.scala:424)
at play.api.mvc.Action$$anonfun$apply$2.apply(Action.scala:96)
at play.api.mvc.Action$$anonfun$apply$2.apply(Action.scala:89)
at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2$$anonfun$1.apply(Accumulator.scala:174)
at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2$$anonfun$1.apply(Accumulator.scala:174)
at scala.util.Try$.apply(Try.scala:192)
at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2.apply(Accumulator.scala:174)
at play.api.libs.streams.StrictAccumulator$$anonfun$mapFuture$2.apply(Accumulator.scala:170)
at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:52)
at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:52)
Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
Upvotes: 1
Views: 885
Reputation: 7583
I added this dependency and it fixed my issue.
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.7.2"
Upvotes: 5