Reputation: 11
I have huge data (images) that uses machine learning model (CNN) to process image and gives results. As part of spark job performance, I'm trying to see internal spark (YARN) job flow. Spark UI shows list of Jobs, Stages - DAG, Executors and worker nodes details, but I'm trying to find/print contents of RDD
in console.
Is it possible to find how images are chunked in each node?
I tried df.rdd.glom().collect()
and did not print anything, df.collect()
gave array of arrays format of image values, but it is consolidated one.
Upvotes: 1
Views: 64