Reputation: 3013
I am trying to write a Spark streaming app, using Scala, which is supposed to read the Twitter feed every second, following the instructions provided here.
My code is:
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import TutorialHelper._
object Tutorial {
def main(args: Array[String]) {
// Checkpoint directory
val checkpointDir = TutorialHelper.getCheckpointDirectory()
// Configure Twitter credentials
val apiKey = "blabla"
val apiSecret = "blabla"
val accessToken = "blabla"
val accessTokenSecret = "blabla"
TutorialHelper.configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret)
val ssc = new StreamingContext(new SparkConf(), Seconds(1))
val tweets = TwitterUtils.createStream(ssc, None)
val statuses = tweets.map(status => status.getText())
statuses.print()
ssc.checkpoint(checkpointDir)
ssc.start()
ssc.awaitTermination()
}
}
When I try to execute, using spark-submit I get the following log:
Configuring Twitter OAuth
Property twitter4j.oauth.consumerKey set as [blabla]
Property twitter4j.oauth.accessToken set as [blabla]
Property twitter4j.oauth.consumerSecret set as [blabla]
Property twitter4j.oauth.accessTokenSecret set as [blabla]
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/09/28 13:22:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/28 13:22:11 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.163.145 instead (on interface eth0)
15/09/28 13:22:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/09/28 13:22:13 INFO Slf4jLogger: Slf4jLogger started
15/09/28 13:22:13 INFO Remoting: Starting remoting
15/09/28 13:22:13 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:59422]
15/09/28 13:22:15 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
15/09/28 13:22:17 INFO WriteAheadLogManager : Recovered 1 write ahead log files from file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata
-------------------------------------------
Time: 1443471738000 ms
-------------------------------------------
15/09/28 13:22:18 INFO WriteAheadLogManager : Attempting to clear 1 old log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471737000: file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata/log-1443468716010-1443468776010
15/09/28 13:22:18 INFO WriteAheadLogManager : Cleared log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471737000
-------------------------------------------
Time: 1443471739000 ms
-------------------------------------------
15/09/28 13:22:19 INFO WriteAheadLogManager : Attempting to clear 0 old log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471738000:
15/09/28 13:22:19 INFO WriteAheadLogManager : Cleared log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471738000
-------------------------------------------
Time: 1443471740000 ms
-------------------------------------------
15/09/28 13:22:20 INFO WriteAheadLogManager : Attempting to clear 0 old log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471739000:
15/09/28 13:22:20 INFO WriteAheadLogManager : Cleared log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471739000
-------------------------------------------
Time: 1443471741000 ms
-------------------------------------------
15/09/28 13:22:21 INFO WriteAheadLogManager : Attempting to clear 0 old log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471740000:
15/09/28 13:22:21 INFO WriteAheadLogManager : Cleared log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471740000
15/09/28 13:22:21 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:93)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/09/28 13:22:21 INFO TwitterStreamImpl: Establishing connection.
15/09/28 13:22:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:93)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/09/28 13:22:21 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:93)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/09/28 13:22:21 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-7] shutting down ActorSystem [sparkDriver]
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStop(TwitterInputDStream.scala:101)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.stopReceiver(ReceiverSupervisor.scala:169)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.stop(ReceiverSupervisor.scala:136)
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl$$anon$2$$anonfun$receive$1.applyOrElse(ReceiverSupervisorImpl.scala:79)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/09/28 13:22:21 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
15/09/28 13:22:21 ERROR ErrorMonitor: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-7] shutting down ActorSystem [sparkDriver]
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStop(TwitterInputDStream.scala:101)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.stopReceiver(ReceiverSupervisor.scala:169)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.stop(ReceiverSupervisor.scala:136)
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl$$anon$2$$anonfun$receive$1.applyOrElse(ReceiverSupervisorImpl.scala:79)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/09/28 13:22:21 WARN ReceiverTracker: Receiver 0 exited but didn't deregister
15/09/28 13:22:21 INFO WriteAheadLogManager : Stopped write ahead log manager
15/09/28 13:22:22 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/09/28 13:22:22 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/09/28 13:22:22 WARN AkkaRpcEndpointRef: Error sending message [message = RemoveBroadcast(0,true)] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#1080029491]] had already been terminated.. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
at scala.concurrent.Future$class.recover(Future.scala:324)
at scala.concurrent.impl.Promise$DefaultPromise.recover(Promise.scala:153)
at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:319)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:128)
at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:67)
at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1136)
at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:67)
Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#1080029491]] had already been terminated.
at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:307)
... 14 more
Upvotes: 2
Views: 1528
Reputation: 420
For spark 1.6.x use scala 2.10.6 version, and make sure it's set in build.sbt file.
Make sure that your hostname is linked to your current IP, in linux environment the hosts are resolved in /etc/hosts file.
Upvotes: 0
Reputation: 231
This question was also answered in:
Spark streaming StreamingContext.start() - Error starting receiver 0
It can be resolved by changing your scalaVersion and libraryDependencies in your build.sbt, to match the ones for your Spark version.
e.g.:
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % "1.6.1" % "provided",
"org.apache.spark" %% "spark-streaming-twitter" % "1.6.1"
)
Be sure to use the newly generated scala-2.11 classes, e.g.:
../../../../spark-1.6.1/bin/spark-submit --class Tutorial /Users/mendezr/development/spark/exercises_spark_submit_2014/usb/streaming/scala/target/scala-2.11/Tutorial-assembly-0.1-SNAPSHOT.jar
Upvotes: 2