Reputation: 61
I've been learning Haskell and I made a few statistical functions that would work on lists of either Int
or Float
types.
Here are two examples:
mean :: (Real a, Floating b) => [a] -> b
mean xs = realToFrac (sum xs) / genericLength xs
sumOfSquares :: (Real a, Floating b) => [a] -> b
sumOfSquares xs = [(mean xs - realToFrac x) ^^ 2 | x <- xs]
However, I am now wondering if I could improve the runtime by writing sumOfSquares
differently. Like this:
sumOfSquares xs = [(m - realToFrac x) ^^ 2 | x <- xs] where m = mean xs
or like this:
sumOfSquares xs = [(m - realToFrac x) ^^ 2 | x <- xs, let m = mean xs]
I don't know how Haskell handles behind the screen. In the original version, does the computer calculate mean xs
for each entry in xs
? I'm assuing Haskell doesn't optimize comprehensions in the background. Am I wrong?
Upvotes: 2
Views: 308
Reputation: 70367
The desugaring does produce different code. There are three implementations you're considering here. I'll show what each of them would look like without the list comprehension.
-- With no let / where
sumOfSquares xs = [(mean xs - realToFrac x) ^^ 2 | x <- xs]
sumOfSquares xs = map (\x -> (mean xs - realToFrac x) ^^ 2) xs
-- With let inside the comprehension
sumOfSquares xs = [(m - realToFrac x) ^^ 2 | x <- xs, let m = mean xs]
sumOfSquares xs = map (\x -> let m = mean xs in (m - realToFrac x) ^^ 2) xs
-- With where outside the comprehension
sumOfSquares xs = [(m - realToFrac x) ^^ 2 | x <- xs] where m = mean xs
sumOfSquares xs = map (\x -> (m - realToFrac x) ^^ 2) xs where m = mean xs
So the first two are basically equivalent. If you have the world's simplest compiler (or you turn off optimizations), you'll get fairly inefficient code that recomputes the mean every iteration. The third avoids this completely, by actually putting the computation outside the where
block. It doesn't guarantee that the value won't be recomputed every step (Haskell is free to compute things however it so chooses so long as the result behaves equivalently to lazy evaluation), but it does improve the odds if your compiler is not very smart.
However, realistically, GHC (and most other compilers) will perform what's called float-out optimization. Basically, if it sees a let
block that it can move outside of a repeated computation, it is free to do so since there are no possible side effects that would change in doing so. In general, GHC will take expressions of the form
\x -> let y = something in f x y
and convert them to
let y = something in \x -> f x y
This optimization is allowed so long as x
does not appear as a free variable inside something
, and most compilers are fairly smart about doing this, so you shouldn't have much to worry about.
There's an excellent StackOverflow answer about a lot of common optimizations performed by GHC, from which this answer is partially inspired. I suggest checking it out if you want to learn more about efficiency in Haskell.
Upvotes: 2