Reputation: 21
pyspark
when I use print() in foreachRdd method, it work!
def echo(data):
print data
....
lines = MQTTUtils.createStream(ssc, brokerUrl, topics)
topic_rdd = lines.map(lambda x: get_topic_rdd(x)).filter(lambda x: x[0]!= None)
topic_rdd.foreachRDD(lambda x: echo(x))
I can look at the log in console with the spark-on-yarn
But if I use the method of foreachPartition ,it can't see any log of print()
topic_rdd = lines.map(lambda x: get_topic_rdd(x)).filter(lambda x: x[0]!= None)
topic_rdd.foreachRDD(lambda x: x.foreachPartition(lambda y: echo(y)))
IF I want to see the log, I need to enter the different partitions to look the log ? Can I see the log in the Single console.By the way ,I can see the log in the Single console with scala but python.
Upvotes: 2
Views: 1554
Reputation: 4719
rdd.foreachRDD is runing on driver node which sending message to your Terminnal
rdd.foreachPartition is runing on worker node which sending message to worker's Terminnal you can not see that
If you want see logs just save them as files
Upvotes: 1