wu alex
wu alex

Reputation: 21

How can I catch the log output of pyspark foreachPartition?

pyspark

when I use print() in foreachRdd method, it work!

def echo(data):
print data
....
lines = MQTTUtils.createStream(ssc, brokerUrl, topics)

topic_rdd = lines.map(lambda x: get_topic_rdd(x)).filter(lambda x: x[0]!= None)

topic_rdd.foreachRDD(lambda x: echo(x))

I can look at the log in console with the spark-on-yarn

But if I use the method of foreachPartition ,it can't see any log of print()

topic_rdd = lines.map(lambda x: get_topic_rdd(x)).filter(lambda x: x[0]!= None)

topic_rdd.foreachRDD(lambda x: x.foreachPartition(lambda y: echo(y)))

IF I want to see the log, I need to enter the different partitions to look the log ? Can I see the log in the Single console.By the way ,I can see the log in the Single console with scala but python.

Upvotes: 2

Views: 1554

Answers (1)

Zhang Tong
Zhang Tong

Reputation: 4719

rdd.foreachRDD is runing on driver node which sending message to your Terminnal

rdd.foreachPartition is runing on worker node which sending message to worker's Terminnal you can not see that

If you want see logs just save them as files

Upvotes: 1

Related Questions