Understanding parsec type annotations

Question

I'm using parsec to parse some source code into an AST. I recently turned on the -Wall and -W options in order to catch "suspicious" code, and it is complaining about many of the parsec-related top-level functions in this file not having explicit type extensions.

Example 1

vimL = choice [ block
              , statement
              ]

The inferred type here is:

vimL :: ParsecT String () Data.Functor.Identity.Identity Node

So, if I add that annotation, the compiler complains about not having access to Data.Functor.Identity.Identity, which means I have to import it:

import Data.Functor.Identity

And if I do that, I can simplify the type annotation to:

vimL :: ParsecT String () Identity Node

and the compiler will still accept it. But it's still not something I understand very deeply.

Example 2

link = Link <$> (bar *> linkText <* bar)
  where
    bar      = char '|'
    linkText = many1 $ noneOf " 	
|"

The inferred type here is:

link :: forall u.
         ParsecT String u Data.Functor.Identity.Identity Node

But I can't use that unless I also use:

{-# LANGUAGE RankNTypes #-}

Note that I can dispense with that if I drop the forall. Both of these work:

link :: ParsecT String u Data.Functor.Identity.Identity Node
link :: ParsecT String u Identity Node

Example 3

string' s = mapM_ char' s >> pure s  s

This one's inferred type is:

string' :: forall s u (m :: * -> *).
            Stream s m Char =>
             [Char] -> ParsecT s u m [Char]

In order to use that one, I need both:

{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE KindSignatures #-}

But again, if I drop the forall I can simplify the type to the following and the compiler still accepts it:

string' :: Stream s m Char => [Char] -> ParsecT s u m [Char]

Not very simple, though. Going further and dropping the constaint:

string' :: [Char] -> ParsecT s u m [Char]

I get:

No instance for (Stream s m Char) arising from a use of ‘char'’

I thought:

{-# LANGUAGE NoMonomorphismRestriction #-}

might get me out of that, but it does not.

Questions

These mostly go over my head, so I don't want to blindly copy-paste the inferred type signatures in without gaining some more insight first. Can anybody shed some light on what these mean, what would be the best practices for annotating parsec-heavy code, what does the forall buy me if it can be omitted without causing any compiler errors, and whether there are any aliasing tricks that I can employ to make these more readable?

Alec · Accepted Answer

I'm no expert with parsec, so I'll let someone else do the heavy lifting if they want to explain the types, but here are some thoughts:

Generally packages try to export more friendly type synonyms. In this case, you can use

type Parsec s u = ParsecT s u Identity -- in Text.Parsec.Prim
type Parser = Parsec String ()         -- in Text.Parsec.String

so that gets you vimL :: Parser Node, which should make more sense - it is a parser that can be run on a String to produce a Node.

forall gets you very little in this context, which is why there are friendly type synonyms available, which you should use. However, I'm willing to bet that in its own guts parsec makes heavy use of higher-rank types, which cannot be expressed without the forall and this is why the signatures GHC proposes to you have an explicit forall.

(Briefly, forall x. is the same as but if you have the forall in the middle of the signature, things get much nastier.)

EDIT

Some stuff on parsec (from the documentation). The type ParsecT s u m a represents the most general parser possible. Reading the comments in the source helps.

s describes the stream type. A parser in the abstract sense takes a sequence of symbols and converts them into some structured output form.
a is the type of the output form.
u is the user state type. parsec already keeps track of some state information (like your position in the text so that it can give you back a meaningful parse error message) so it makes sense to let the user package in some state they would like carried along (there is an example of this in 2.12 Advanced: User State)
m is the underlying monad in which things are being run. I think this part will be evident iff you grok monads...

Then, a couple special cases arise:

Taking m = Identity means that we don't need a monadic context of execution. (The Parsec s u a type synonym is for this case.)
Taking u = () means that we don't need to hold any state information.
Taking s = String means our input (stream) will be a string. (Combined with the other two options above, that is what the Parser a type synonym is for.)

Finally, string' :: forall s u (m :: * -> *). Stream s m Char => [Char] -> ParsecT s u m [Char] means that the output is a String = [Char], and the user state, monadic context, and input can be anything - provided they satisfy some conditions, hence the Stream s m Char constraint.

That constraint Stream s m t means that you must be able to "unfold" the stream input type s into an m (Maybe (t,s)). The m part means that this unfolding can occur in a monadic context, the Maybe part deals with the fact that you can unfold only as long as you have input, the t is token you are taking off the front of the stream, and s is the remainder of the stream. Finally, the type of the stream s has to uniquely identify the type of the token t coming out, so there is a functional dependency s -> t.