Mr.Bloom
Mr.Bloom

Reputation: 365

Haskell: Parsing a file finishes after first expression despite more input in file

The following is an example program of a language in which I'm writing a parser.

n := 1  
Do (1)->          -- The 1 in brackets is a placeholder for a Boolean or relational expression.
        n := 1 + 1
Od

When the program looks like this, the parseFile functions ends after the assignment on the first line however when the assignment is removed, it parses as expected. Below is how it's called in GHCI, first with the first line present then removed:

λ > parseFile "example.hnry"
Assign "n" (HInteger 1)

λ > parseFile "example.hnry"
Do (HInteger 1) (Assign "n" (AExpr (HInteger 1) Add (HInteger 1)))

The expected output would look similar to this:

λ > parseFile "example.hnry"
Assign "n" (HInteger 1) Do (HInteger 1) (Assign "n" (AExpr (HInteger 1) Add (HInteger 1)))

I first assumed it was something to do with the the assignment parser but in the body of the loop, there exists an assignment which parses as expected so I was able to rule that out. I believe that the issue is within the parseFile function itself. The following is the parseFile function and the other functions that make up the parseExpression function that I'm using to parse a program.

I think that the error is within parseFile because it parses an expression only once and doesn't "loop" for the want of a better word to itself to check if there's more input left the parse. I think that's the error but I'm not quite sure.

parseFile :: String -> IO HVal
parseFile file =
    do program <- readFile file
           case parse parseExpression "" program of
        Left  err    -> fail "Parse Error"
        Right parsed -> return $ parsed

parseExpression :: Parser HVal
parseExpression = parseAExpr <|> parseDo <|> parseAssign

parseDo :: Parser HVal
parseDo = do
   _  <- string "Do "
   _  <- char '('
   x  <- parseHVal  -- Will be changed to a Boolean expression
   _  <- string ")->"
   spaces
   y  <- parseExpression
   spaces
   _  <- string "Od"
   return $ Do x y

parseAExpr :: Parser HVal
parseAExpr = do
   x  <- parseInteger
   spaces
   op <- parseOp
   spaces
   y  <- parseInteger <|> do
                _ <- char '('
                z <- parseAExpr
                            _ <- char ')'
                            return $ z
   return $ AExpr x op y

parseAssign :: Parser HVal
parseAssign = do
 var <- oneOf ['a'..'z'] <|> oneOf ['A'..'Z']
 spaces
 _   <- string ":="
 spaces
 val <- parseHVal <|> do
                    _ <- char '('
                        z <- parseAExpr
                        _ <- char ')'
                        return $ z
 return $ Assign [var] val

Upvotes: 2

Views: 85

Answers (1)

K. A. Buhr
K. A. Buhr

Reputation: 50829

As you note, your parseFile function parses a single expression (though maybe "statement" would be a better name) using the parseExpression parser. You probably want to introduce a new parser for a "program" or sequence of expressions/statements:

parseProgram :: Parser [HVal]
parseProgram = spaces *> many (parseExpression <* spaces)

and then in parseFile, replace parseExpression with parseProgram:

parseFile :: String -> IO [HVal]
parseFile file =
    do program <- readFile file
       case parse parseProgram "" program of
         Left  err    -> fail "Parse Error"
         Right parsed -> return $ parsed

Note that I've had to change the type here from HVal to [HVal] to reflect the fact that a program, being a sequence of expressions each of type HVal, needs to be represented as some sort of data type capable of combining multiple HVals together, and a list [HVal] is one way of doing so.

If you want a program to be an HVal instead of an [HVal], then you need to introduce a new constructor in your HVal type that's capable of representing programs. One method is to use a constructor to directly represent a block of statements:

data HVal = ... | Block [HVal]

Another is to add a constructor represent a sequence of two statements:

data HVal = ... | Seq HVal HVal

Both methods are used in real parsers. (Note that you'd normally pick one; you wouldn't use both.) To represent a sequence of three assignment statements, for example, the block method would do it directly as a list:

Block [Assign "a" (HInteger 1), Assign "b" (HInteger 2), Assign "c" (HInteger 3)]

while the two-statement sequence method would build a sort of nested tree:

Seq (Assign "a" (HInteger 1)) (Seq (Assign "b" (HInteger 2)
                                   (Assign "c" (HInteger 3))

The appropriate parsers for these two alternatives, both of which return a plain HVal, might be:

-- use blocks
parseProgram1 :: Parser HVal
parseProgram1 = do
  spaces
  xs <- many (parseExpression <* spaces)
  return $ Block xs

parseProgram2 :: Parser HVal
parseProgram2 = do
  spaces
  x <- parseExpression
  spaces
  (do xs <- parseProgram2
      return $ Seq x xs)
    <|> return x

Upvotes: 4

Related Questions