Saikiran Yerram
Saikiran Yerram

Reputation: 3092

Spark streaming not working

I have a rudimentary spark streaming word count and its just not working.

import sys
from pyspark import SparkConf, SparkContext
from pyspark.streaming import StreamingContext

sc = SparkContext(appName='streaming', master="local[*]")
scc = StreamingContext(sc, batchDuration=5)

lines = scc.socketTextStream("localhost", 9998)
words = lines.flatMap(lambda line: line.split())
counts = words.map(lambda word: (word, 1)).reduceByKey(lambda x, y: x + y)

counts.pprint()

print 'Listening'
scc.start()
scc.awaitTermination()  

I have on a another terminal running nc -lk 9998 and I pasted some text. It prints out the typical logs (no exceptions) but it ends up queuing the job for some weird time (45 yrs) and it keeps on printing this...

15/06/19 18:53:30 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
15/06/19 18:53:30 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (PythonRDD[7] at RDD at PythonRDD.scala:43)
15/06/19 18:53:30 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/06/19 18:53:35 INFO JobScheduler: Added jobs for time 1434754415000 ms
15/06/19 18:53:40 INFO JobScheduler: Added jobs for time 1434754420000 ms
15/06/19 18:53:45 INFO JobScheduler: Added jobs for time 1434754425000 ms
...
...

What am I doing wrong?

Upvotes: 3

Views: 1916

Answers (1)

Holden
Holden

Reputation: 7442

Spark Streaming requires multiple executors to work. Try using local[4] for the master.

Upvotes: 4

Related Questions