Chris Bedford
Chris Bedford

Reputation: 2692

Summary metrics for completed tasks in spark UI: subtasks total time way less than duration?

I am studying spark by reviewing videos like this one -> https://youtu.be/G7PcSBhfSQo?t=8135 from a Spark Summit presentation. Very good video, but I did have a question about the slide presented at the start point of the youtube link I provided (I am also attaching a screen shot of that slide below). My puzzlement arises from the fact that, on the slide presented, the min/max/median duration of the 2 tasks analyzed is 11 seconds. However, the total of the subtask times (for things like scheduler delay, gc time, 'getting result' time, etc. is nowhere near 11 seconds. I'm wondering what else could be happening that bumped total task duration to 11 seconds ? Is there some other screen that would have this (seemingly) missing info ? Thanks in advance !
-- spark UI

Upvotes: 0

Views: 599

Answers (1)

Richard EB
Richard EB

Reputation: 1027

A task's execution time can be broken up as Scheduler Delay + Deserialization Time + Shuffle Read Time (optional) + Executor Runtime + Shuffle Write Time (optional) + Result Serialization Time + Getting Result Time. Tuning these aspects can help optimize performance. - IBM Knowledge Centre

Upvotes: 2

Related Questions