Kevin Meredith
Kevin Meredith

Reputation: 41909

Understanding Haskell's `map` - Stack or Heap?

Given the following function:

f :: [String]
f = map integerToWord [1..999999999]

integerToWord :: Integer -> String

Let's ignore the implementation. Here's a sample output:

ghci> integerToWord 123999
"onehundredtwentythreethousandandninehundredninetynine"

When I execute f, do all results, i.e. f(0) through f(999999999) get stored on the stack or heap?

Note - I'm assuming that Haskell has a stack and heap.

After running this function for ~1 minute, I don't see the RAM increasing from its original usage.

Upvotes: 0

Views: 322

Answers (3)

Luis Casillas
Luis Casillas

Reputation: 30217

GHC programs use a stack and a heap... but it doesn't work at all like the eager language stack machines you're familiar with. Somebody else is gonna have to explain this, because I can't.

The other challenge in answering your question is that GHC uses the following two techniques:

  1. Lazy evaluation
  2. List fusion

Lazy evaluation in Haskell means that (as the default rule) expressions are only evaluated when their value is demanded, and even then they may only be partially evaluated—only far enough as needed to resolve a pattern match that requires the value. So we can't say what your map example does without knowing what is demanding its value.

List fusion is a set of rewrite rules built into GHC, that recognize a number of situations where the output of a "good" list producer is only ever consumed as the input of a "good" list consumer. In these cases, Haskell can fuse the producer and the consumer into an object-code loop without ever allocating list cells.

In your case:

  1. [1..999999999] is a good producer
  2. map is both a good consumer and a good producer
  3. But you seem to be using ghci, which doesn't do fusion. You need to compile your program with -O for fusion to happen.
  4. You haven't told us what would be consuming the output of the map. If it's a good consumer it will fuse with the map.

But there's a good chance that GHC would eliminate most or all of the list cell allocations if you compiled (with -O) a program that just prints the result of that code. In that case, the list would not exist as a data structure in memory at all—the compiler would generate object code that does something roughly equivalent to this:

for (int i = 1; i <= 999999999; i++) {
    print(integerToWord(i));
}

Upvotes: 0

MathematicalOrchid
MathematicalOrchid

Reputation: 62818

First: To split hairs, the following answer applies to GHC. A different Haskell compiler could plausibly implement things differently.

There is indeed a heap and a stack. Almost everything goes on the heap, and hardly anything goes on the stack.

Consider, for example, the expression

let x = foo 17 in ...

Let's assume that the optimiser doesn't transform this into something completely different. The call to foo doesn't appear on the stack at all; instead, we create a note on the heap saying that we need to do foo 17 at some point, and x becomes a pointer to this note.

So, to answer your question: when you call f, a note that says "we need to execute map integerToWord [1..999999999] someday" gets stored on the heap, and you get a pointer to that. What happens next depends on what you do with that result.

If, for example, you try to print the entire thing, then yes, the result of every call to f ends up on the heap. At any given moment, only a single call to f is on the stack.

Alternatively, if you just try to access the 8th element of the result, then a bunch of "call f 5 someday" notes end up on the heap, plus the result of f 8, plus a note for the rest of the list.

Incidentally, there's a package out there ("vacuum"?) which lets you print out the actual object graphs for what you're executing. You might find it interesting.

Upvotes: 2

zerkms
zerkms

Reputation: 254916

To be precise - when you "just execute" f it's not evaluated unless you use its result somehow. And when you do - it's stored according to how it's required to fulfill the caller requirements.

As of this example - it's not stored anywhere: the function is applied to every number, the result is output to your terminal and is discarded. So at a given moment in time you only allocate enough memory to store the current value and the result (which is an approximation, but for the case it's precise enough).

References:

Upvotes: 6

Related Questions