Reputation: 879
I'm new to Haskell and I don't understand what happens when you define a where
section in a function.
For example, in the following function
f x = y + y
where y = product [0..x]
I don't understand if y
is only replaced by product [0..x]
and computed twice, or if product [0..x]
is computed once and its result is saved in something like a variable called y
and then make the sum.
Wouldn't it be inefficient if it's computed twice?
Upvotes: 2
Views: 146
Reputation: 74334
Haskell is pure so that the value is the same if you were to inline y
wherever you liked
y + y where y = product [0..x] == product [0..x] + product [0..x]
but Haskell allows its implementations to choose faster execution paths if desired. In particular, Haskell allows for a lazy (different from the "non-strict" semantics which Haskell demands of its implementations) method of computation
y + y where y = product [0..x] -- where-let transform
==
let y = product [0..x] in y + y -- store 'y' on heap
==
{y = THUNK} y + y -- (+) forces its arguments
-- let's assume x = 6
==
{y = product [0..6]} y + y -- we still need `y`, so eval
==
{y = 0} y + y -- replace
==
0 + 0
==
0
As you can see, one way to implement this code is to give y
a value in the heap as a THUNK
, a delayed value, and then compute it as needed and use the final computed value wherever y
is needed. This is known as lazy graph reduction semantics and is indeed what GHC implements.
Note that it's technically possible for GHC to do the opposite, to transform
product [0..x] + product [0..x]
into
let y = product [0..x] in y + y
and the execute it as before, saving some computation. An optimizing compiler could go around looking for opportunities like this, but it must do so with restraint! It's very easy to have your compiler produce code which performs much worse (space leakages) when lifting common subexpressions like this.
For that reason, while GHC will use the names you directly write in order to save repeated computation, it's unlikely to eliminate common subexpressions on its own.
When in doubt, use a let
or a where
!
Upvotes: 7
Reputation: 54058
This binds a value to the name y
and then uses it twice in a definition. Your intuition is correct that it would be inefficient to compute it twice, and it would if you defined it as
f x = product [0..x] + product [0..x]
However it may be possible for GHC to optimize this away, but probably without Apparently not.-O2
. Haven't tested this theory though.
Upvotes: 3