Reputation: 2692
I am studying spark by reviewing videos like this one -> https://youtu.be/G7PcSBhfSQo?t=8135 from a Spark Summit presentation. Very good video, but I did have a question about the slide presented at the start point of the youtube link I provided (I am also attaching a screen shot of that slide below). My puzzlement arises from the fact that, on the slide presented, the min/max/median
duration of the 2 tasks analyzed is 11 seconds. However, the total of
the subtask times (for things like scheduler delay, gc time, 'getting result' time, etc. is nowhere near 11 seconds. I'm wondering what else could be happening that bumped total task duration to 11 seconds ? Is there some other screen that would have this (seemingly) missing info ? Thanks in advance !
--
Upvotes: 0
Views: 599
Reputation: 1027
A task's execution time can be broken up as Scheduler Delay + Deserialization Time + Shuffle Read Time (optional) + Executor Runtime + Shuffle Write Time (optional) + Result Serialization Time + Getting Result Time. Tuning these aspects can help optimize performance. - IBM Knowledge Centre
Upvotes: 2