Reputation: 145
I have a very simple spark streaming job running locally in standalone mode. There is a customer receiver which reads from database and pass it to the main job which prints the total. Not an actual use case but I am playing around to learn. Problem is that job gets stuck forever, logic is very simple so I think it is neither doing any processing nor memory issue. What is strange is if I STOP the job, suddenly in logs I see the output of job execution and other backed jobs follow! Can some one help me understand what is going on here?
val spark = SparkSession
.builder()
.master("local[1]")
.appName("SocketStream")
.getOrCreate()
val ssc = new StreamingContext(spark.sparkContext,Seconds(5))
val lines = ssc.receiverStream(new HanaCustomReceiver())
lines.foreachRDD{x => println("==============" + x.count())}
ssc.start()
ssc.awaitTermination()
After terminating program following logs roll which shows execution of the batch -
17/06/05 15:56:16 INFO JobGenerator: Stopping JobGenerator immediately
17/06/05 15:56:16 INFO RecurringTimer: Stopped timer for JobGenerator after time 1496696175000
17/06/05 15:56:16 INFO JobGenerator: Stopped JobGenerator
==============100
Upvotes: 1
Views: 651
Reputation: 74679
TL;DR Use local[2]
at the very least.
The issue is the following line:
.master("local[1]")
You should have at least 2 threads to use for receivers in Spark Streaming or your streaming jobs don't get even a chance to start as they get stuck waiting for resources, i.e. a free thread to allocate.
Quoting Spark Streaming's A Quick Example:
// The master requires 2 cores to prevent from a starvation scenario.
My recommendation is to use local[2]
(the minimum) or local[*]
to take as many cores as possible. The best solution is to use a cluster manager like Apache Mesos, Hadoop YARN or Spark Standalone.
Upvotes: 2