Luke Xu
Luke Xu

Reputation: 2440

How does variable binding work with recursion in Haskell?

I was reading about Haskell and it stated that once a variable is bound to an expression it cannot be rebound for example

x = 10
x = 11

Assign.hs:2:1: error:
    Multiple declarations of ‘x’
    Declared at: Assign.hs:1:1
                 Assign.hs:2:1

So if this is the case... how is recursion working where the same variable keeps getting bound to other things? For example

drop n xs = if n <= 0 || null xs
              then xs
              else drop (n-1) (tail xs)

the variable n... is being bound again and again every time the function recurses. Why is this allowed?

Also if you guys could tell me some key words to search for so I can learn more about this myself I'd highly appreciate it.

Upvotes: 3

Views: 1766

Answers (3)

Ben
Ben

Reputation: 71535

In almost any language, identifiers are only meaningful because they exist in a scope; conceptually a sort of implicit map from names to the things they denote.

In modern languages you usually have (at least) a global scope, and each local scopes associated with each function/procedure. Pseudocode example:

x = 1
print x
x = 2
print x

function plus_one(a):
  b = 1
  return a + b

print plus_one(x)

x is a name in the global scope, a and b are names in the local scope of the function plus_one.

Imperative languages (and non-pure declarative languages) are generally understood by thinking of those names as mapped to a sort of slot or pigeon hole, into which things can be stored by assignment, and the current connects referred to by using the name. This works only because an imperative step-by-step way of thinking about these programs gives us a way of understanding what "current" means.1 The above example shows this; x is first assigned 1, then printed, then assigned 2, then printed again; we'd expect this to print "1" and then print "2" (and then print "3" from the last line).

Given that understanding of variables as slots that can store things, it's easy to fall into the trap of thinking of the local variables that represent a function's arguments as just slots that get filled when you call the function. I say trap because this is not a helpful way of thinking about function arguments and calls, even in imperative languages. It more or less works as long as each function only ever has one call "in flight" at once, but introducing one of any number of common programming features breaks this model (recursion, concurrency, laziness, closures, etc). That's the misunderstanding at the heart of a number of questions I've seen here on SO, where the poster was having trouble understanding recursive calls, or wanted to know how to access the local variables of a function from outside, or etc.

You should actually think of a function as having a separate scope associated with each call of that function2. The function itself is kind of like a template for a scope, rather than a scope itself (although common language does usually talk about "the scope of a function" as shorthand). If you provided bindings for the function's parameters then you can produce a scope, but a typical function is called many times with different parameters, so there isn't just one scope for that function.

Consider my pseudocode plus_one, with argument a. You could imagine that a is a local name for a variable, and the call plus_one(x) just assigns the contents of x into the slot for a and then starts executing the code of plus_one. But I contend that it's better to think that when you call plus_one on x you're creating a new scope, in which there is a variable called a (containing with the contents of the global scope x at this point), but it is not "the" a variable.

This is vitally important for understanding recursion, even in imperative languages:

function drop(n, xs):
  if n <= 0 || null(xs):
    return xs
  else:
    return drop(n - 1, tail(xs))

Here we could try to imagine that there's only one xs variable, and when we make the recursive call to drop we're assigning the tail of the original xs to the xs variable and starting the function's code again. But that falls down as soon as we change it to something like:

function drop(n, xs):
  if n <= 0 || null(xs):
    return xs
  else:
    result = drop(n - 1, tail(xs))
    print xs
    return result

Now we're using xs after the recursive call. What this does is hard to explain if we're imagining that there's only one xs, but trivial if we think of there being a separate xs in a separate scope every time we call drop. When we make the recursive call to drop (passing it n - 1 and tail(xs)) it creates its own separate xs, so it's entirely unmysterious that print xs in this scope still can access xs.

So, is this story different with Haskell? It's true that the nature of variables is quite different in Haskell than from typical recursive languages. Instead of scopes mapping names to slots in which we can place different contents at different times, scopes map names directly to values, and there's no notion of time in which we could say an identifier "was" bound to one value (or was not bound to anything) and "now" is bound to a different value. x = 1 at global scope in Haskell isn't a step that "happens" and changes something, it's just a fact. x just is 1. Haskell isn't throwing an error at your x = 10 and x = 11 lines because the designers wanted to restrict you to only assigning a variable once, rather Haskell does not have a concept of assignment at all. x = 10 is giving a definition for x in that scope, and you can't have two separate definitions for the same name in the same scope.

The local scopes that come from functions in Haskell are the same; each name just is associated with a particular value3. But the actual values are parameterised on the values on which you call the function (they are quite literally a function of those values); each call has its own separate scope, with a different mapping from names to values. It's not that at each call n "changes" to become bound to the new argument, just that each call has a different n.

So recursion affects variable binding in Haskell in pretty much the same way it does in all major imperative languages. What's more, the extremely flippant way to describe how recursion affects name binding in all of these languages is to say "it doesn't". Recursion actually isn't special at all in this way; parameter passing, local scopes, etc, works exactly the same way when you call a recursive function as when you call an "ordinary" non-recursive functions, provided you understand the local scopes as being associated with each call of the function, not as a single thing associated with the function.


1 Sometimes even the scope mapping itself is mutable, and entries mapping names to slots can be added and removed as steps of the program. Python, for example, has a mutable global scope (in each module); you can add and remove module global variables dynamically, even with names determined from runtime data. But it uses immutable local scopes: a function's local variables exist even before they've been assigned a value (or after that value has been removed with del).

2 At least, this is how functions/procedures very common work in modern languages. It's not totally universal, but it's generally acknowledged to be a good way for functions/procedures to work.

3 Of course, thanks to laziness the particular value might be the bottom value, which is the "value" we pretend is the value of infinite loops and other forms of nontermination, so that we can interpret expressions as always having a value.

Upvotes: 1

Michal Charemza
Michal Charemza

Reputation: 27032

From https://en.wikibooks.org/wiki/Haskell/Variables_and_functions,

Within a given scope, a variable in Haskell gets defined only once and cannot change.

but scope isn't just the code of the function. Roughly, it's the code of function + its context, i.e. the arguments it's been called with, and anything from its outer scope. This is often referred to as a closure. Every time you call a function, the code runs under a new closure, and so the variables can evaluate to different things.

The recursive case isn't anything special with regards to this: a function being called twice by other code would run with a different closure, and its internal variables can evaluate to different things.

Upvotes: 1

Davislor
Davislor

Reputation: 15144

Haskell has lexical scoping. When you declare a variable inside a scope, it hides any variable with the same name from an outer scope. For example, in the following program,

x :: Int
x = 5

main :: IO ()
main = do
         x <- readLn :: IO Int
         print x

If you compile with ghc -Wall, it will compile correctly, but give you the following warnings:

sx-scope.hs:2:1: warning: [-Wunused-top-binds]
    Defined but not used: ‘x’

sx-scope.hs:6:10: warning: [-Wname-shadowing]
    This binding for ‘x’ shadows the existing binding
      defined at sx-scope.hs:2:1

So the x outside main and the x in main are different variables, and the one in the inner scope temporarily hides the one in the outer scope. (Okay, “temporarily” in this example means until main returns, which is as long as the program is running, but you get the concept.)

Similarly, when you declare drop n xs or \n xs ->, n and xs are local variables that only exist during that invocation of the function. It can be called more than once with different values of n and xs.

When a function is tail-recursive, that is, returns a call to itself, the compiler knows it is about to replace the old parameters, which were from a scope that no longer exists, with updated parameters of the same type. So it can re-use the stack frame and store the new parameters in the same locations as the previous ones. The resulting code can be as fast as iteration in a procedural language.

Upvotes: 3

Related Questions