Ftyupl
Ftyupl

Reputation: 79

Sharing data in Haskell

I would like to understand how the memory sharing mechanism works in Haskell. Indeed, a way to program a function calculating the terms of the fibonnaci sequence is:

fibo' n = f n
  where 
    f 0 = 1
    f 1 = 1
    f n = fibs !! (n-1) + fibs !! (n-2)
    fibs = [f x | x <- [0..]]

This assumes for there to be an improvement that the fibs list is not evaluated multiple times but is shared between calls but I'm not sure about this assumption or how it works.

Upvotes: 4

Views: 280

Answers (2)

Carl
Carl

Reputation: 27023

To a first approximation in GHC, it's the same value in memory if and only if it's the same variable. Naming a thing creates the potential for sharing it.

Exceptions to this rule:

  • If GHC decides to inline a definition, it's duplicated each place it's inlined. In general this isn't a problem, as only "simple" things are inlined. This only really becomes something you need to worry about if your value is using unsafePerformIO with something that isn't actually pure and you were counting on sharing so that the value remains consistent across an entire program execution. The typical solution here is to mark things defined that unsafely with a NOINLINE pragma.
  • If GHC does common subexpression elimination, that can introduce sharing of an expression without explicitly naming it. This should only matter if the heuristics for when to apply that transformation misfire, and they've been very good for a long time now. I guess this is a problem for libraries like criterion which want to time evaluating the same expression repeatedly. I dug through its source to figure out how it deals with it, and it seems to solve it by moving the evaluation into a context where GHC has no idea what the concrete types are, which is enough to prevent CSE from firing.
  • The full laziness transform will sometimes float a heap allocation outside of a lambda, if the heap allocation doesn't depend on an argument to the lambda. This one can be a giant pain. It would in fact apply to your code. Neither f nor fibs depend on the argument to fibo'. If GHC decides to apply the full laziness transform, your code might end up transformed as such:

It would transform the current form, which is equivalent to:

fibo' = \n' -> let f 0 = 1
                   f 1 = 1
                   f n = fibs !! (n-1) + fibs !! (n-2)
                   fibs = [f x | x <- [0..]]
               in f n'

To the following, with the let floated outside of the lambda:

fibo' = let f 0 = 1
            f 1 = 1
            f n = fibs !! (n-1) + fibs !! (n-2)
            fibs = [f x | x <- [0..]]
        in \n' -> f n'

The scoping matters there. After floating the let to the top, the definitions of f and fibs are shared across multiple invocations of fibo', instead of being reallocated for each call. In this case, that might introduce too much sharing, as calling fibs once with a large argument would result in holding on to that that much of the calculated list regardless of how much is needed for the rest of the program. In general I consider the full laziness transformation to be a misfeature and disable it via compiler options. I will float an allocation by hand if I want sharing.

Upvotes: 3

Ben
Ben

Reputation: 71590

As a first approximation, you can just use variable scopes to predict what will be shared.

All the places that use fibs are within the same scope, so each occurrence of that name resolves to the same object in memory. That means each time f is invoked within one one top-level fibo' call, there is one single fibs list. It's shared across f calls.

However, fibs is defined in a where scope, attached to a function equation fibo' n =. That means the scope is "inside" each call of the fibo' (on some particular n); all the names defined in the where clause refer to different objects in memory each time fibo' is called. (This is just like how local variables - defined inside a function - work in any mainstream programming language)

What you have here is a function that will work with a list of pre-computed Fibonacci numbers to save recomputation within one top-level call, but then will throw that away and start from scratch on the next top-level call.

That may be exactly what you want, so this isn't wrong per se. If fibs was defined in a scope outside the top-level application, then it would be shared across all calls, but that also means it would permanently occupy memory to remain available for the next call. As objects in Haskell can expand in memory as they are evaluated more-deeply, this could be considered a memory leak. Throwing it away after each top-level call instead means multiple calls repeat some work, but you're only using the memory needed to speed up each call (rather than the memory needed for the largest call you have ever made), and only while it is running. So which way is "better" depends on what the rest of your program is doing with this function, not on this function itself.

Note that none of the definitions in your where clause are actually using the arguments of fibo', and fibo' itself isn't really "doing anything"; it just forwards the call immediately to f. So the only "interesting" effect of putting your code in a where like this is creating a scope where you can define fibs inside the top-level call but outside the inner f calls. (I'm considering the access control effects to be non-interesting). If you didn't want that, and did want fibs to be shared across calls, it would be simpler to just define your code like this:

fibo'' 0 = 1
fibo'' 1 = 1
fibo'' n = fibs !! (n-1) + fibs !! (n-2)

fibs = [fibo'' x | x <- [0..]]

Now fibs is just a single object in the global scope, and will be shared across all calls (but will occupy memory across all calls too).

If you do care about the access-control (nothing else can access fibs if it's defined in a local scope, but can if it's defined in a global scope), you can instead do this:

fibo''' = f
  where 
    f 0 = 1
    f 1 = 1
    f n = fibs !! (n-1) + fibs !! (n-2)
    fibs = [f x | x <- [0..]]

Here fibs (and f) are defined in a local scope. But the equation which the where is attached to is just fibo''' = f. This local scope is still "outside" the top-level call, because the scope is "entered" before fibo''' receives its argument. Whereas the original fibo' has the where attached to an equation that has already brought the argument n into scope, so the where scope is "entered" after each top-level call is made.


Now, I did start this post with "as a first approximation". The compiler is allowed to reorganise your code to try and optimise it. For example it could theoretically notice that fibs doesn't depend on the argument n, and so move it outside the top-level call, making your fibo' behave like fibo'''. Or (again theoretically) it could decide to inline fibs which would remove all sharing and make your fibo' behave like a naiive recursive Fibonacci calculator.

In practice, however, it is extremely unlikely to do either of these things. It's "allowed" to do any transformation that doesn't change the end result of your code, but the transformations the GHC developers actually put into the compiler are designed to make your code better. So they try very hard not to increase or reduce sharing in a way that causes large memory leaks or slowdowns.

So while the compiled and optimised code frequently is quite different from what you actually wrote, it's pretty safe to reason about your code under the assumption that "if you give something a name, all uses of that name within the same scope will be shared". You just have to be careful about when exactly local scopes are "entered" so that you know what counts as "within the same scope".

Upvotes: 5

Related Questions