Maximilian Wilson
Maximilian Wilson

Reputation: 461

Why is F# compiler sometimes incorrectly generalizing functions?

I ran into some unexpected behavior from the F# compiler recently. I was able to find a workaround, but the original behavior baffles me and I wanted to see if anyone can help me understand what causes it.

A function that I had defined as non-generic was becoming generic, which interfered with the ability of the function to share state between multiple invocations. I simplified my use-case down to the following:

let nextId =
  let mutable i = 0
  let help (key:obj) =
    i <- i + 1
    i
  help
nextId "a" // returns 1
nextId "b" // also returns 1!!!!

Why is nextId of type 'a -> int instead of obj -> int? Clearly the generalization is also responsible for the bug where it returns 1 repeatedly, but why is the generalization occurring in the first place?

Note that if I define it without naming the nested function, it works as expected in giving unique Ids:

let nextId =
  let mutable i = 0
  fun (key:obj) ->
    i <- i + 1
    i
nextId "a" // returns 1
nextId "b" // returns 2

But even more mysterious, with this definition, F# Interactive can't decide whether nextId is an (obj -> int) or an ('a -> int). When I first define it I get

val nextId : (obj -> int)

but if I simply eval

nextId

I get

val it : ('a -> int)

What's going on here and why does my simple function get automatically generalized?

Upvotes: 9

Views: 199

Answers (2)

scrwtp
scrwtp

Reputation: 13577

I can't tell why the compiler decides to generalize in your first scenario, but ultimately the distinction between nextId being of type obj -> int vs 'a -> int is what drives the seemingly weird behavior here.

For what it's worth, you can "force" the expected behavior in your first scenario with yet another type annotation:

let nextId : obj -> int =
    let mutable i = 0
    let help (key:obj) =
        i <- i + 1
        i
    help

Now if you put those values in modules (like in this gist), compile and inspect the assembly in ILSpy, you'll find that the code is almost identical except for where the ref cell for the counter is instantiated:

  • In the concrete case, nextId is a property yielding a function which is instantiated together with the ref cell in the static initializer for the module, i.e. all calls to nextId share the same counter,

  • In the generic case, nextId is a generic function yielding a function, and the ref cell is instantiated within its body, i.e. you have a counter per a call to nextId.

So the code emitted in the generic case could actually be rendered in F# with this snippet:

let nextId () =
    let mutable i = 0
    fun key ->
        i <- i + 1
        i

The bottom line is that it would make sense to emit a compiler warning when you have a generic value like this. It's easy to avoid the problem once you know it's there, but it's one of those things you won't see coming otherwise.

Upvotes: 5

Tomas Petricek
Tomas Petricek

Reputation: 243041

I agree that this is quite unexpected behaviour. I think the reason why F# performs the generalization is that it treats help (when returning it) as fun x -> help x. Calling a function that takes obj seems to be one case where the compiler performs generalization (because it knows that anything can be obj). The same generalization happens, for example, in:

let foo (o:obj) = 1
let g = fun z -> foo z

Here, g becomes 'a -> int too, just like in your first version. I don't quite know why the compiler does this, but what you see can be explained by 1) treating help as fun x -> help x and 2) generalising on calls taking obj.

The other thing that is happening is how F# treats generic values - generic values are generally problematic in ML languages (that's what the whole "value restriction" business is about), but F# allows it in some limited cases - you can for example write:

let empty = []

This defines a generic value of type 'a list. The caveat is that this gets compiled as a function that is called every time you access the empty value. I think your first nextId function gets compiled in the same way - so the body is evaluated each time you access it.

This probably does not answer the why but I hope it provides some more tips on how this is happening - and in what other cases the behaviour you're seeing might be sensible!

Upvotes: 8

Related Questions