Yong Hyun Kwon
Yong Hyun Kwon

Reputation: 379

Spark Understanding Interval between jobs

In spark UI, I am wondering what is going on between jobs and looking for any ways reduce them, especially after collect and before writing parquet. I see a really long break before submitting parquet, almost 1 minute. Considering the whole application is taking 2 minutes, it takes a great proportion. Does this break usually means sparking is going over all the workers and collecting datas? Even so, the interval before parquet is quite longer than other actions, such as collect or first. Thanks

Here is the image enter image description here

Upvotes: 0

Views: 1227

Answers (1)

Travis Hegner
Travis Hegner

Reputation: 2495

In my experience, that delay is generally present when the driver portion of your job is busy doing work. For instance, if you do a .collect(), and then iterate over the resulting Array, that work is being done sequentially on the driver, and would result in no tasks being assigned to the executors during that time.

Upvotes: 1

Related Questions