TheMP
TheMP

Reputation: 8427

How are variable names chosen in type signatures inferred by GHC?

When I play with checking types of functions in Haskell with :t, for example like those in my previous question, I tend to get results such as:

Eq a => a -> [a] -> Bool
(Ord a, Num a, Ord a1, Num a1) => a -> a1 -> a
(Num t2, Num t1, Num t, Enum t2, Enum t1, Enum t) =>  [(t, t1, t2)]

It seems that this is not such a trivial question - how does the Haskell interpreter pick literals to symbolize typeclasses? When would it choose a rather than t? When would it choose a1 rather than b? Is it important from the programmer's point of view?

Upvotes: 6

Views: 930

Answers (3)

Bakuriu
Bakuriu

Reputation: 101989

The names of the type variables aren't significant. The type:

Eq element => element -> [element] -> Bool

Is exactly the same as:

Eq a => a -> [a] -> Bool

Some names are simply easier to read/remember.

Now, how can an inferencer choose the best names for types?

Disclaimer: I'm absolutely not a GHC developer. However I'm working on a type-inferencer for Haskell in my bachelor thesis.

During inferencing the names chosen for the variables aren't probably that readable. In fact they are almost surely something along the lines of _N with N a number or aN with N a number.

This is due to the fact that you often have to "refresh" type variables in order to complete inferencing, so you need a fast way to create new names. And using numbered variables is pretty straightforward for this purpose.

The names displayed when inference is completed can be "pretty printed". The inferencer can rename the variables to use a, b, c and so on instead of _1, _2 etc.

The trick is that most operations have explicit type signatures. Some definitions require to quantify some type variables (class, data and instance for example). All these names that the user explicitly provides can be used to display the type in a better way.

When inferencing you can somehow keep track of where the fresh type variables came from, in order to be able to rename them with something more sensible when displaying them to the user. An other option is to refresh variables by adding a number to them. For example a fresh type of return could be Monad m0 => a0 -> m0 a0 (Here we know to use m and a simply because the class definition for Monad uses those names). When inferencing is finished you can get rid of the numbers and obtain the pretty names.

In general the inferencer will try to use names that were explicitly provided through signatures. If such a name was already used it might decide to add a number instead of using a different name (e.g. use b1 instead of c if b was already bound).

There are probably some other ad hoc rules. For example the fact that tuple elements have like t, t1, t2, t3 etc. is probably something done with a custom rule. In fact t doesn't appear in the signature for (,,) for example.

Upvotes: 8

dfeuer
dfeuer

Reputation: 48611

How does GHCi pick names for type variables? explains how many of these variable names come about. As Ganesh Sittampalam pointed out in a comment, something strange seems to be happening with arithmetic sequences. Both the Haskell 98 report and the Haskell 2010 report indicate that

[e1..] = enumFrom e1

GHCi, however, gives the following:

Prelude> :t [undefined..]
[undefined..] :: Enum t => [t]

Prelude> :t enumFrom undefined
enumFrom undefined :: Enum a => [a]

This makes it clear that the weird behavior has nothing to do with the Enum class itself, but rather comes in from some stage in translating the syntactic sequence to the enumFrom form. I wondered if maybe GHC wasn't really using that translation, but it really is:

{-# LANGUAGE NoMonomorphismRestriction #-}
module X (aoeu,htns) where
aoeu = [undefined..]
htns = enumFrom undefined

compiled using ghc -ddump-simpl enumlit.hs gives

X.htns :: forall a_aiD. GHC.Enum.Enum a_aiD => [a_aiD]
[GblId, Arity=1]
X.htns =
  \ (@ a_aiG) ($dEnum_aiH :: GHC.Enum.Enum a_aiG) ->
    GHC.Enum.enumFrom @ a_aiG $dEnum_aiH (GHC.Err.undefined @ a_aiG)

X.aoeu :: forall t_aiS. GHC.Enum.Enum t_aiS => [t_aiS]
[GblId, Arity=1]
X.aoeu =
  \ (@ t_aiV) ($dEnum_aiW :: GHC.Enum.Enum t_aiV) ->
    GHC.Enum.enumFrom @ t_aiV $dEnum_aiW (GHC.Err.undefined @ t_aiV)

so the only difference between these two representations is the assigned type variable name. I don't know enough about how GHC works to know where that t comes from, but at least I've narrowed it down!


Ørjan Johansen has noted in a comment that something similar seems to happen with function definitions and lambda abstractions.

Prelude> :t \x -> x
\x -> x :: t -> t

but

Prelude> :t map (\x->x) $ undefined
map (\x->x) $ undefined :: [b]

In the latter case, the type b comes from an explicit type signature given to map.

Upvotes: 3

Matt Fenwick
Matt Fenwick

Reputation: 49105

Are you familiar with the concepts of alpha equivalence and alpha substitution? This captures the notion that, for example, both of the following are completely equivalent and interconvertible (in certain circumstances) even though they differ:

\x -> (x, x)
\y -> (y, y)

The same concept can be extended to the level of types and type variables (see "System F" for further reading). Haskell in fact has a notion of "lambdas at the type level" for binding type variables, but it's hard to see because they're implicit by default. However, you can make them explicit by using the ExplicitForAll extension, and play around with explicitly binding your type variables:

ghci> :set -XExplicitForAll 
ghci> let f x = x; f :: forall a. a -> a

In the second line, I use the forall keyword to introduce a new type variable, which is then used in a type.

In other words, it doesn't matter whether you choose a or t in your example, as long as the type expressions satisfy alpha-equivalence. Choosing type variable names so as to maximize human convenience is an entirely different topic, and probably far more complicated!

Upvotes: 0

Related Questions