dan
dan

Reputation: 11

How to find which input image/data is processed on which worker node in spark?

I have huge data (images) that uses machine learning model (CNN) to process image and gives results. As part of spark job performance, I'm trying to see internal spark (YARN) job flow. Spark UI shows list of Jobs, Stages - DAG, Executors and worker nodes details, but I'm trying to find/print contents of RDD in console. Is it possible to find how images are chunked in each node?

I tried df.rdd.glom().collect() and did not print anything, df.collect() gave array of arrays format of image values, but it is consolidated one.

Upvotes: 1

Views: 64

Answers (0)

Related Questions