El-ad David Amir
El-ad David Amir

Reputation: 197

How can I benchmark performance in Spark console?

I have just started using Spark and my interactions with it revolve around spark-shell at the moment. I would like to benchmark how long various commands take, but could not find how to get the time or run a benchmark. Ideally I would want to do something super-simple, such as:

val t = [current_time]
data.map(etc).distinct().reduceByKey(_ + _)
println([current time] - t)

Edit: Figured it out --

import org.joda.time._
val t_start = DateTime.now()
[[do stuff]]
val t_end = DateTime.now()
new Period(t_start, t_end).toStandardSeconds()

Upvotes: 1

Views: 999

Answers (1)

eliasah
eliasah

Reputation: 40380

I suggest you do the following :

def time[A](f: => A) = {
  val s = System.nanoTime
  val ret = f
  println("time: " + (System.nanoTime - s) / 1e9 + " seconds")
  ret
}

You can pass a function as an argument to time function and it will compute the result of the function giving you the time taken by the function to be performed.

Let's consider a function foobar that take data as argument and then do the following :

val test = time(foobar(data))

test will contains the result of foobar and you'll get the time needed as well.

Upvotes: 3

Related Questions