Haskell custom parser terminology

Question

So I'm doing a homework assignment to fill in implementation for a few functions that work with a custom Haskell parser (this only works with boolean logic). I'm trying to understand what the code means or is supposed to do on a high level. Some of this (marked with "(wrong?)") was implemented by me but I think most of it is wrong as I don't really understand what is going on.

What does it mean for Haskell to parse a con/disjunction ast? This should make something that looks like p T_OR q in parseOr right? How about parseNegation, parseNot, and parseParens?

Sorry for the broad scope I just don't understand what these are supposed to be doing. This is all the relevant code so please give your best guess if you can't tell.

parseOr implements the single rule D --> C | D. The definition says:

  - parse a conjunction AST and name it "p"
  - ensure that the next symbol in the stream is T_OR
  - parse a disjunction AST and name it "q"
  - return the AST "AND p q"

> parseOr = do
>   p <- parseConjunction
>   eat T_OR
>   q <- parseDisjunction
>   return (OR p q)

parseDisjunction implements the CFG rules for the D nonterminal:

  D --> C | D
  D --> C

> parseDisjunction = parseOr <|> parseConjunction

parseAnd implements the rule C --> N & C.

> parseAnd = do
>   p <- parseNegation
>   eat T_AND
>   q <- parseConjunction
>   return (AND p q)

parseConjunction should implement the rules for the C nonterminal:

  C --> N & C
  C --> N

It will probably look almost identical to parseDisjunction, but with different
parsers being combined.

(wrong?)
> parseConjunction = parseAnd <|> parseDisjunction 

parseNot should implement the rule N --> !N.

(wrong?)
> parseNot = do 
>   p <- parseNegation
>   eat T_NOT
>   return (NOT p)

parseNegation should implement the rules for the N nonterminal:

  N --> !N
  N --> A

(wrong?)
> parseNegation = parseNot <|> parseConjunction 

parseParens should implement the rule A --> (D).

(wrong?)
> parseParens = do 
>   eat L_PAREN
>   p <- parseParens
>   eat R_PAREN
>   return (p)

eat implementation

eat :: Eq t => t -> Parser t ()
eat x = Parser $ 	s ->
  case ts of
    [] -> Nothing
    (t:ts)
      | x == t -> Just ((), ts)
      | otherwise -> Nothing

K. A. Buhr · Accepted Answer

What you have here is a recursive descent parser written in a monadic style, sometimes called a "monadic parser". This a very common method for writing parsers in Haskell, and there are a number of very popular parser libraries (e.g., Parsec) that support writing parsers in this style.

It looks like this one might have been hand-written for this assignment, so you probably have some boilerplate code that includes a Parser type with Applicative, Monad, and Alternative instances defined, plus the eat utility function you included in your question. You probably also have a parse or runParser helper function that would be used to actually apply one of the parse functions (e.g., parseDisjunction) to a list of input tokens.

On the one hand, the nuts and bolts of how these parsers work can be pretty complicated. However, at a high level, you can quickly develop an intuitive understanding of how they behave. In fact, it looks like your professor probably thought that just one example (parseOr with some comments) would be enough to explain it, though that might have been optimistic!

If it's not clear to you, the way this parser is supposed to be used is that we assume that some human-readable representation of a Boolean expression, say the string:

"A and B or C or not D"

has already been processed into a list of "tokens":

ts = [T_ATOM "A", T_AND, T_ATOM "B", T_OR, T_ATOM "C", T_OR, T_NOT, T_ATOM "D"]

Note that I didn't see T_ATOM in your code, but I presume you have something like it in the definition of the token data type.

The purpose of the parser functions, like parseOr, is to take this token list as input and produce an abstract syntax tree (AST) to represent it. In this case, the AST will look like this:

OR (AND (ATOM "A") (ATOM "B")) (OR (ATOM "C") (NOT (ATOM "D")))

So when the comments say "parse a conjunction AST and name it p", they are referring to the fact that the statement:

p <- parseConjunction

will take some tokens from the beginning of the list of input tokens and turn them into AST representation of a conjunction. Specifically, if parseOr is applied to the token list ts above, this first statement p <- parseConjunction will take the tokens:

T_ATOM "A", T_AND, T_ATOM "B"

from the beginning of the input list and parse them into the following AST which will be assigned to p:

(AND (ATOM "A") (ATOM "B"))

Now, the nice thing about writing parsers in this style is that you don't have to explicitly manipulate the token list. Instead, you can write parsers in a very abstract manner that closely matches the formal rules of the grammar. For example, the rule:

D -> C | D

can be pretty directly translated into:

parseOr = do
  p <- parseConjuction  -- this parses the conjunction "C" on the RHS
  eat T_OR              -- this parses the terminal symbol "|" 
  q <- parseDisjunction -- this parses the disjunction "D" on the RHS
  return (OR p q)       -- the produces the AST result of the parse
                        -- which is the disjunction "D" on the LHS

while a choice between multiple grammar rules:

D -> C | D
D -> C

can be translated into:

parseDisjunction =
  parseOr                 -- try to parse "C | D" RHS
  <|> parseConjunction    -- if that fails, try just the "C" RHS

Does that help?

Haskell custom parser terminology

Answers (1)

Related Questions