Reputation: 31
I need help with this program I am trying to write in Haskell. I have written most of it, and here is what I am basically trying to do:
parse "a + b"
in the terminal I want this as output:
Plus (Word "a") (Word "b")
parse "a - 2 * b + c"
in the terminal I want this as output:
Minus (Word "a") (Plus (Mult (Num 2) (Word "b")) (Word "c"))
My code so far:
data Ast
= Word String
| Num Int
| Mult Ast Ast
| Plus Ast Ast
| Minus Ast Ast
deriving (Eq, Show)
tokenize :: [Char] -> [String]
tokenize [] = []
tokenize (' ' : s) = tokenize s
tokenize ('+' : s) = "+" : tokenize s
tokenize ('*' : s) = "*" : tokenize s
tokenize (c : s)
| isDigit c =
let (cs, s') = collectWhile isDigit s
in (c : cs) : tokenize s'
| isAlpha c =
let (cs, s') = collectWhile isAlpha s
in (c : cs) : tokenize s'
| otherwise = error ("unexpected character " ++ show c)
collectWhile :: (Char -> Bool) -> String -> (String, String)
collectWhile p s = (takeWhile p s, dropWhile p s)
isDigit, isAlpha :: Char -> Bool
isDigit c = c `elem` ['0' .. '9']
isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']
parseU :: [String] -> (Ast, [String])
parseU ("+" : s0) =
let (e1, s1) = parseU s0
(e2, s2) = parseU s1
in (Plus e1 e2, s2)
parseU ("*" : s0) =
let (e1, s1) = parseU s0
(e2, s2) = parseU s1
in (Mult e1 e2, s2)
parseU (t : ts)
| isNumToken t = (Num (read t), ts)
| isWordToken t = (Word t, ts)
| otherwise = error ("unrecognized token " ++ show t)
parseU [] = error "unexpected end of input"
isNumToken, isWordToken :: String -> Bool
isNumToken xs = takeWhile isDigit xs == xs
isWordToken xs = takeWhile isAlpha xs == xs
parse :: String -> Ast
parse s =
case parseU (tokenize s) of
(e, []) -> e
(_, t : _) -> error ("unexpected token " ++ show t)
inn :: Ast -> String
inn (Plus x y) = innP x ++ " + " ++ innP y
inn (Mult x y) = innP x ++ " * " ++ innP y
inn ast = innP ast
innP :: Ast -> String
innP (Num n) = show n
innP (Plus x y) = "(" ++ innP x ++ " + " ++ innP y ++ ")"
innP (Mult x y) = "(" ++ innP x ++ " * " ++ innP y ++ ")"
innP (Word w) = w --
innfiks :: String -> String
innfiks s = inn (parse s)
Right now I get an error posting the text I wrote in the terminal, but when I write it like this:
parse "+ a b"
I get the correct output:
Plus (Word "a") (Word "b")
I know I have to change the code so it accepts what I send to the parse function on this form:
value operator value,
and not on this form:
operator value value
But im struggling to find out what and where I have to do this change.
Upvotes: 3
Views: 993
Reputation: 51039
To handle infix operators with precedence, one approach is to introduce a sequence of parsing functions corresponding to the precedence levels. So, if you have "factors" that can be multiplied to create "terms" which can be added or subtracted to create "expressions", you'll want to create parser functions for each of these levels. Parsing a "factor" (i.e., a word or number) is easy, since you've already written the code:
parseFactor :: [String] -> (Ast, [String])
parseFactor (t : ts)
| isNumToken t = (Num (read t), ts)
| isWordToken t = (Word t, ts)
| otherwise = error ("unrecognized token " ++ show t)
parseFactor [] = error "unexpected end of input"
Parsing a term is trickier. You want to start by parsing a factor and then, optionally, a *
followed by another factor, and then treat that as a term to be further optionally multiplied by another factor, and so on. The following is one way to do it:
parseTerm :: [String] -> (Ast, [String])
parseTerm ts
= let (f1, ts1) = parseFactor ts -- parse first factor
in go f1 ts1
where go acc ("*":ts2) -- add a factor to an accumulating term
= let (f2, ts3) = parseFactor ts2
in go (Mult acc f2) ts3
go acc rest = (acc, rest) -- no more factors: return the term
If you want, try writing a similar parseExpr
to parse terms separated by +
characters (skipping subtraction for now), and test it on something like:
parseExpr (tokenize "2 + 3 * 6 + 4 * 8 * 12 + 1")
For spoilers, here's a version that handles both +
and -
, though note that your tokenizer doesn't yet handle subtraction correctly, so you'll have to fix that first.
parseExpr :: [String] -> (Ast, [String])
parseExpr ts
= let (f1, ts1) = parseTerm ts
in go f1 ts1
where go acc (op:ts2)
| op == "+" || op == "-"
= let (f2, ts3) = parseTerm ts2
in go ((astOp op) acc f2) ts3
go acc rest = (acc, rest)
astOp "+" = Plus
astOp "-" = Minus
With that in place, you can point parse
to the right parser:
parse :: String -> Ast
parse s =
case parseExpr (tokenize s) of
(e, []) -> e
(_, t : _) -> error ("unexpected token " ++ show t)
and your example should work:
λ> parse "a - 2 * b + c"
Plus (Minus (Word "a") (Mult (Num 2) (Word "b"))) (Word "c")
Note that this is slightly different than the output you said you wanted, but this ordering is correct for left-associative operators (which is important for correctly handling -
). That is, you want:
5 - 4 + 1
to parse as:
(5 - 4) + 1 -- i.e., (Plus (Minus (Num 5) (Num 4)) (Num 1))
so that an evaluator will calculate the correct answer of 2. If you parse it as:
5 - (4 + 1) -- i.e., (Minus (Num 5) (Plus (Num 4) (Num 1)))
your evaluator would calculate the wrong answer of 0.
However, if you really want to parse with right-associative operators, see below.
The full modified code for left-associative operators:
data Ast
= Word String
| Num Int
| Mult Ast Ast
| Plus Ast Ast
| Minus Ast Ast
deriving (Eq, Show)
tokenize :: [Char] -> [String]
tokenize [] = []
tokenize (' ' : s) = tokenize s
tokenize ('-' : s) = "-" : tokenize s
tokenize ('+' : s) = "+" : tokenize s
tokenize ('*' : s) = "*" : tokenize s
tokenize (c : s)
| isDigit c =
let (cs, s') = collectWhile isDigit s
in (c : cs) : tokenize s'
| isAlpha c =
let (cs, s') = collectWhile isAlpha s
in (c : cs) : tokenize s'
| otherwise = error ("unexpected character " ++ show c)
collectWhile :: (Char -> Bool) -> String -> (String, String)
collectWhile p s = (takeWhile p s, dropWhile p s)
isDigit, isAlpha :: Char -> Bool
isDigit c = c `elem` ['0' .. '9']
isAlpha c = c `elem` ['a' .. 'z'] ++ ['A' .. 'Z']
parseFactor :: [String] -> (Ast, [String])
parseFactor (t : ts)
| isNumToken t = (Num (read t), ts)
| isWordToken t = (Word t, ts)
| otherwise = error ("unrecognized token " ++ show t)
parseFactor [] = error "unexpected end of input"
parseTerm :: [String] -> (Ast, [String])
parseTerm ts
= let (f1, ts1) = parseFactor ts
in go f1 ts1
where go acc ("*":ts2)
= let (f2, ts3) = parseFactor ts2
in go (Mult acc f2) ts3
go acc rest = (acc, rest)
parseExpr :: [String] -> (Ast, [String])
parseExpr ts
= let (f1, ts1) = parseTerm ts
in go f1 ts1
where go acc (op:ts2)
| op == "+" || op == "-"
= let (f2, ts3) = parseTerm ts2
in go ((astOp op) acc f2) ts3
go acc rest = (acc, rest)
astOp "+" = Plus
astOp "-" = Minus
isNumToken, isWordToken :: String -> Bool
isNumToken xs = takeWhile isDigit xs == xs
isWordToken xs = takeWhile isAlpha xs == xs
parse :: String -> Ast
parse s =
case parseExpr (tokenize s) of
(e, []) -> e
(_, t : _) -> error ("unexpected token " ++ show t)
For right-associative operators, modify these definitions:
parseTerm :: [String] -> (Ast, [String])
parseTerm ts
= let (fct, ts1) = parseFactor ts
in case ts1 of
"*":ts2 -> let (trm, rest) = parseTerm ts2
in (Mult fct trm, rest)
_ -> (fct, ts1)
parseExpr :: [String] -> (Ast, [String])
parseExpr ts
= let (trm, ts1) = parseTerm ts
in case ts1 of
op:ts2 | op == "+" || op == "-"
-> let (expr, rest) = parseExpr ts2
in (astOp op trm expr, rest)
_ -> (trm, ts1)
where astOp "+" = Plus
astOp "-" = Minus*
Upvotes: 3