osk
osk

Reputation: 810

How does Spark web UI calculate duration for completed jobs?

So transformations in Spark are lazy which means that they are not executed until an action has been executed.

Does the same concept apply to the timing that is done by Spark? For instance, I have a program which reads in a bunch of text files, runs some algorithms on them and finally runs a foreach on the RDD which prints out some data.

When I inspect "Completed Jobs" in the Spark UI, there is only one job which is the foreach. This job took 1.5 min and the total uptime was 1.6 min. Does this 1.5 min include the time it took to read and create the initial RDD, run the algorithms on the RDD and finally run whatever was in the foreach clause (because the map, filter and other transformations I ran on the RDD are lazy)? Or does this 1.5 min only show the time it took for whatever code was in the foreach clause?

Side question: Is it possible to "mark" some functions to be timed, for instance if I have a RDD.map that I would like to know the time of?

Upvotes: 1

Views: 775

Answers (1)

undefined_variable
undefined_variable

Reputation: 6228

Time includes time to read data and apply all transformations necessary to complete the action.

If there are some transformations defined which is not necessary to complete the action then those transformation will not be ran by spark.

Upvotes: 1

Related Questions