Reputation:
I'm trying to write a lisp interpreter in haskell, inspired by Norvig's in Python (http://norvig.com/lispy.html). I have a successful tokenizer which I can link to if need be. Here it outputs the correct code up to Norvig's Python tokenizer.
program = "(begin (define r 10) (* pi (* r r)))"
astTokenized = tokenize program
astTokenized == ["(","begin","(","define","r","10",")","(","*","pi","(","*","r","r",")",")",")"]
Here I define my abstract syntax tree data type, although I know that it already has some implicit error as it isn't wrapped in a list.
data Ast x = Val x | Node [Ast x] deriving (Show)
Here was my first try:
parse :: [[Char]] -> [Ast [Char]]
parse (x:xs)
| x == "(" = [Node (parse xs)]
| x == ")" = []
| otherwise = (Val x) : parse xs
Hopeful, except for it terminates after the first ')'.
Prelude> parse astTokenized
[Node [Val "begin",Node [Val "define",Val "r",Val "10"]]]
Here I change the base case for [], and adjust the condition for ')' so it will parse the entire input terminate, but it now just creates a deeper tree, therefore failing to branch correctly.
parse [] = []
parse (x:xs)
| x == "(" = [Node (parse xs)]
| x == ")" = parse xs
| otherwise = (Val x) : parse xs
Prelude> parse astTokenized
[Node [Val "begin",Node [Val "define",Val "r",Val "10",Node [Val "*",Val "pi",Node [Val "*",Val "r",Val "r"]]]]]
In some way it needs to allow for "parallel" trees, not merely nested. Any help would be appreciated.
Upvotes: 6
Views: 1709
Reputation: 38952
The question being homework/exercise, what follows is an explanation that attempts to avoid giving away the solution directly; it does so by being written in Common Lisp (that, and because I don't know Haskell ;-) ).
In Common Lisp there is an intermediate function called READ-DELIMITED-LIST
, which reads multiple forms as a list until the reader reaches an ending character. When encountering an opening parenthesis, the reader grabs all forms until the closing parenthesis, and then continue parsing from there. The difference is that it works on streams of characters, not on tokens, and that the stream is used for side-effects.
In a purely functional approach, like in your code, the parse function needs to return the remaining tokens to be processed, along with the returned AST. This allows you to consume as many tokens as you need during parsing, and allows the caller to continue parsing from where the parser ended.
In other words, when you close a parenthesis, you have to return xs
to the calling context. And so, you carry an accumulator object (the state) along with your code. I heard that monads can help you with the boilerplate.
MULTIPLE-VALUE-BIND
and VALUES
work together: (values x y)
returns multiple values and multiple-value-bind
captures multiple values from an expression, and bind each of them to a variable.
DESTRUCTURING-BIND
is the ancestor of pattern matching, and simply destructures a list into components.
A non-special form (f x1 .. x2)
is function application
(case test . clauses)
is a switch, where each clause is a list starting with a literal value (or a list of such values) and followed by expressions. t
and otherwise
are special keywords that always match (the default case).
The rest should be pretty self-explanatory (ask otherwise).
Notice that I use symbols <
and >
to represent respectively the opening and closing tokens, to avoid any confusion with Lisp's parentheses. For example, the list (< a b c < d > >)
contains 8 tokens, and could be the internal representation of "(a b c (d))"
. I could have use left-paren
and right-paren
, or even complex data-structures, this is just an internal representation. The lexer is not detailed here.
The entry point parse
function takes a list of tokens and returns the parsed value and the remaining tokens as a secondary value; it depends on parse-until-close
, defined below:
(defun parse (tokens)
(when tokens
(destructuring-bind (head . tail) tokens
(case head
(> (error "unmatched closing parenthesis"))
(< (parse-until-close tail))
(otherwise (values head tail))))))
Then, parse-until-close
, a function that recursively parse tokens until it finds a close token; note that tokens
is re-bound at different points:
(defun parse-until-close (tokens)
(when tokens
(case (first tokens)
(> (values nil (rest tokens)))
(otherwise
;; first read the element in head of tokens
(multiple-value-bind (head tokens) (parse tokens)
;; then recurse to read the remaining items in list
(multiple-value-bind (tail tokens) (parse-until-close tokens)
(values (cons head tail) tokens)))))))
The above recursively parse tokens and build a list. If our list of tokens starts with >
, the close token, we return the empty list as well as the rest of the tokens.
Otherwise, we parse one element and recurse with parse-until-close
.
Each call returns two values, the parsed token and the remaining ones:
(parse '(one token))
=> ONE
(TOKEN)
(parse '(< abc < x > y >))
=> (ABC (X) Y)
NIL
(parse '(< abc def >))
=> (ABC DEF)
NIL
;; incomplete input
(parse '(< < < abc))
=> (((ABC)))
NIL
Upvotes: 10
Reputation: 3085
Are you looking for a result like this?
[Node [Val "begin",Node [Val "define",Val "r",Val "10"],Node [Val "*",Val "pi",Node [Val "*",Val "r",Val "r"]]]]
Here is an approach:
data Ast x = Val x | Node [Ast x] deriving (Show)
parseh :: [[Char]] -> [Ast [Char]] -> (Ast [Char], [String])
parseh [] as = (Node as, [])
parseh (x : xs) as
| x == "(" = (let (a, xs') = (parseh xs []) in (parseh xs' (as ++ [a])))
| x == ")" = (Node as, xs)
| otherwise = parseh xs (as ++ [Val x])
parse :: [[Char]] -> [Ast [Char]]
parse xs = let (Node as, _) = parseh xs [] in as
Upvotes: 1