Reputation: 379
In spark UI, I am wondering what is going on between jobs and looking for any ways reduce them, especially after collect and before writing parquet. I see a really long break before submitting parquet, almost 1 minute. Considering the whole application is taking 2 minutes, it takes a great proportion. Does this break usually means sparking is going over all the workers and collecting datas? Even so, the interval before parquet is quite longer than other actions, such as collect or first. Thanks
Upvotes: 0
Views: 1227
Reputation: 2495
In my experience, that delay is generally present when the driver portion of your job is busy doing work. For instance, if you do a .collect()
, and then iterate over the resulting Array
, that work is being done sequentially on the driver, and would result in no tasks being assigned to the executors during that time.
Upvotes: 1