Reputation: 4231
Why is the creation of Lazy
type so slow?
Assume the following code:
type T() =
let v = lazy (0.0)
member o.a = v.Value
type T2() =
member o.a = 0.0
#time "on"
for i in 0 .. 10000000 do
T() |> ignore
#time "on"
for i in 0 .. 10000000 do
T2() |> ignore
The first loop gives me: Real: 00:00:00.647
whereas the second loop gives me Real: 00:00:00.051
. Lazy is 13X slower!!
I have tried to optimize my code in this way and I ended up with simulation code 6X slower. It was then fun to track back where the slow down occurred...
Upvotes: 4
Views: 534
Reputation: 911
Update int the .net core world
A new constructor was added to Lazy to handle constants such as your case. Unfortunately F#'s lazy
"pseudo keyword" always (at the moment!) will wrap constants as functions.
Anyway, if you change:
let v = lazy (0.0)
to:
let v = Lazy<_> 0.0 // NB. Only .net core at the moment
then you will find that your T()
class will only take ~3 times your T2
.
(What's the point of having a lazy constant? Well means that you can use Lazy as an abstraction with quite little overhead when you do have a mix of constants and real lazy items...)
...and...
If you actually use the created value a number of times then the overhead shrinks further. i.e. something such as:
open System.Diagnostics
type T() =
let v = Lazy<_> 0.1
member o.a () = v.Value
type T2() =
member o.a () = 0.1
let withLazyType () =
let mutable sum = 0.0
for i in 0 .. 10000000 do
let t = T()
for __ = 1 to 10 do
sum <- sum + t.a()
sum
let withoutLazyType () =
let mutable sum = 0.0
for i in 0 .. 10000000 do
let t = T2()
for __ = 1 to 10 do
sum <- sum + t.a()
sum
let runtest name count f =
let mutable checksum = 0.
let mutable totaltime = 0L
for i = 0 to count do
if i = 0 then
f () |> ignore // warm up
else
let sw = Stopwatch.StartNew ()
checksum <- checksum + f ()
totaltime <- totaltime + sw.ElapsedMilliseconds
printfn "%s: %4d (checksum=%f for %d runs)" name (totaltime/int64 count) checksum count
[<EntryPoint>]
let main _ =
runtest "w/o lazy" 10 withoutLazyType
runtest "with lazy" 10 withLazyType
0
brings the difference in time to < 2 times.
NB. I worked on the new lazy implementation...
Upvotes: 2
Reputation: 25516
The Lazy version has some significant overhead code -
60 .method public specialname
61 instance default float64 get_a () cil managed
62 {
63 // Method begins at RVA 0x2078
64 // Code size 14 (0xe)
65 .maxstack 3
66 IL_0000: ldarg.0
67 IL_0001: ldfld class [FSharp.Core]System.Lazy`1<float64> Test/T::v
68 IL_0006: tail.
69 IL_0008: call instance !0 class [FSharp.Core]System.Lazy`1<float64>::get_Value()
70 IL_000d: ret
71 } // end of method T::get_a
Compare this to the direct version
.method public specialname
130 instance default float64 get_a () cil managed
131 {
132 // Method begins at RVA 0x20cc
133 // Code size 10 (0xa)
134 .maxstack 3
135 IL_0000: ldc.r8 0.
136 IL_0009: ret
137 } // end of method T2::get_a
So the direct version has a load and then return, whilst the indirect version has a load then a call and then a return.
Since the lazy
version has an extra call I would expect it to be significantly slower.
UPDATE:
So I wondered if we could create a custom version of lazy
which did not require the method calls - I also updated the test to actual call the method rather than just create the objects. Here is the code:
type T() =
let v = lazy (0.0)
member o.a() = v.Value
type T2() =
member o.a() = 0.0
type T3() =
let mutable calculated = true
let mutable value = 0.0
member o.a() = if calculated then value else failwith "not done";;
#time "on"
let lazy_ =
for i in 0 .. 1000000 do
T().a() |> ignore
printfn "lazy"
#time "on"
let fakelazy =
for i in 0 .. 1000000 do
T3().a() |> ignore
printfn "fake lazy"
#time "on"
let direct =
for i in 0 .. 1000000 do
T2().a() |> ignore
printfn "direct";;
Which gives the following result:
lazy
Real: 00:00:03.786, CPU: 00:00:06.443, GC gen0: 7
val lazy_ : unit = ()
--> Timing now on
fake lazy
Real: 00:00:01.627, CPU: 00:00:02.858, GC gen0: 2
val fakelazy : unit = ()
--> Timing now on
direct
Real: 00:00:01.759, CPU: 00:00:02.935, GC gen0: 2
val direct : unit = ()
Here the lazy
version is only 2x slower than the direct version and the fake lazy version is even slightly faster than the direct version - this is probably due to a GC happening during the benchmark.
Upvotes: 7