Oldrich Svec
Oldrich Svec

Reputation: 4231

Performance of Lazy in F#

Why is the creation of Lazy type so slow?

Assume the following code:

type T() =
  let v = lazy (0.0)
  member o.a = v.Value

type T2() =
  member o.a = 0.0

#time "on"

for i in 0 .. 10000000 do
  T() |> ignore

#time "on"

for i in 0 .. 10000000 do
  T2() |> ignore

The first loop gives me: Real: 00:00:00.647 whereas the second loop gives me Real: 00:00:00.051. Lazy is 13X slower!!

I have tried to optimize my code in this way and I ended up with simulation code 6X slower. It was then fun to track back where the slow down occurred...

Upvotes: 4

Views: 534

Answers (2)

Paul Westcott
Paul Westcott

Reputation: 911

Update int the .net core world

A new constructor was added to Lazy to handle constants such as your case. Unfortunately F#'s lazy "pseudo keyword" always (at the moment!) will wrap constants as functions.

Anyway, if you change:

let v = lazy (0.0)

to:

let v = Lazy<_> 0.0 // NB. Only .net core at the moment

then you will find that your T() class will only take ~3 times your T2.

(What's the point of having a lazy constant? Well means that you can use Lazy as an abstraction with quite little overhead when you do have a mix of constants and real lazy items...)

...and...

If you actually use the created value a number of times then the overhead shrinks further. i.e. something such as:

open System.Diagnostics

type T() =
  let v = Lazy<_> 0.1
  member o.a () = v.Value

type T2() =
  member o.a () = 0.1

let withLazyType () =
    let mutable sum = 0.0
    for i in 0 .. 10000000 do
        let t = T()
        for __ = 1 to 10 do
            sum <- sum + t.a()
    sum

let withoutLazyType () =
    let mutable sum = 0.0
    for i in 0 .. 10000000 do
        let t = T2()
        for __ = 1 to 10 do
            sum <- sum + t.a()
    sum

let runtest name count f =
    let mutable checksum = 0.
    let mutable totaltime = 0L
    for i = 0 to count do
        if i = 0 then
            f () |> ignore // warm up
        else
            let sw = Stopwatch.StartNew ()
            checksum <- checksum + f ()
            totaltime <- totaltime + sw.ElapsedMilliseconds
    printfn "%s: %4d (checksum=%f for %d runs)" name (totaltime/int64 count) checksum count

[<EntryPoint>]
let main _ =
    runtest "w/o  lazy" 10 withoutLazyType
    runtest "with lazy" 10 withLazyType

    0

brings the difference in time to < 2 times.

NB. I worked on the new lazy implementation...

Upvotes: 2

John Palmer
John Palmer

Reputation: 25516

The Lazy version has some significant overhead code -

 60     .method public specialname
 61            instance default float64 get_a ()  cil managed
 62     {
 63         // Method begins at RVA 0x2078
 64     // Code size 14 (0xe)
 65     .maxstack 3
 66     IL_0000:  ldarg.0
 67     IL_0001:  ldfld class [FSharp.Core]System.Lazy`1<float64> Test/T::v
 68     IL_0006:  tail.
 69     IL_0008:  call instance !0 class [FSharp.Core]System.Lazy`1<float64>::get_Value()
 70     IL_000d:  ret
 71     } // end of method T::get_a

Compare this to the direct version

 .method public specialname
130            instance default float64 get_a ()  cil managed
131     {
132         // Method begins at RVA 0x20cc
133     // Code size 10 (0xa)
134     .maxstack 3
135     IL_0000:  ldc.r8 0.
136     IL_0009:  ret
137     } // end of method T2::get_a

So the direct version has a load and then return, whilst the indirect version has a load then a call and then a return.

Since the lazy version has an extra call I would expect it to be significantly slower.

UPDATE: So I wondered if we could create a custom version of lazy which did not require the method calls - I also updated the test to actual call the method rather than just create the objects. Here is the code:

type T() =
  let v = lazy (0.0)
  member o.a() = v.Value

type T2() =
  member o.a() = 0.0

type T3() = 
  let mutable calculated = true
  let mutable value = 0.0
  member o.a() = if calculated then value else failwith "not done";;

#time "on"
let lazy_ = 
  for i in 0 .. 1000000 do
    T().a() |> ignore
  printfn "lazy"
#time "on"
let fakelazy = 
  for i in 0 .. 1000000 do
    T3().a() |> ignore
  printfn "fake lazy"

#time "on"
let direct = 
  for i in 0 .. 1000000 do
    T2().a() |> ignore
  printfn "direct";;

Which gives the following result:

lazy
Real: 00:00:03.786, CPU: 00:00:06.443, GC gen0: 7

val lazy_ : unit = ()


--> Timing now on

fake lazy
Real: 00:00:01.627, CPU: 00:00:02.858, GC gen0: 2

val fakelazy : unit = ()


--> Timing now on

direct
Real: 00:00:01.759, CPU: 00:00:02.935, GC gen0: 2

val direct : unit = ()

Here the lazy version is only 2x slower than the direct version and the fake lazy version is even slightly faster than the direct version - this is probably due to a GC happening during the benchmark.

Upvotes: 7

Related Questions