Reputation: 2745
The very minimal grammar shown here has the sole purpose of illustrating the problem I encountered. No meaning attached.
Example inputs and outputs:
IN: select TOP [1]
OUT: Select TOP (ConstInt 1)
IN: select MIN [AVG(2,3)]
OUT: Select MIN (Function "AVG" [ConstInt 2,ConstInt 3])
IN: select TOP [MIN(2,3)]
OUT: Parse error. Unexpected token TokAssumption "MIN"
The reason for the parse error is that this grammar allows string MIN
to be both an "assumption" (the keyword right after select
) and a function name. In the last example, it is parsed as an assumption, while it should have been parsed as a function name in that position.
When MIN
is encountered, the first lexer rule is the one that reads it as a TokAssumption
, so it is not read as a TokIdentifier
.
My question is how to handle this. Obviously, the simplest solution would be make sure assumptions and function names cannot overlap. But this is not possible in the actual parser I'm trying it fix.
My intuition is that this should be possible by using a monadic action from the parser, to check the lookahead token. When the lookahead token is a (
, it means the current token should be lexed as a TokIdentifier
.
Am I looking in the right direction? Is there a better way to handle such cases?
This is the Happy parser for this silly grammar:
{
module Parser (
parser,
L.runAlex
) where
import Data
import qualified Lexer as L
}
%name parser algexpr
%tokentype { L.Token }
%error { parseError }
%monad { L.Alex } { >>= } { pure }
%lexer { lexer } { L.EOF }
%token
'(' { L.TokLParen }
')' { L.TokRParen }
'[' { L.TokLSquare }
']' { L.TokRSquare }
',' { L.TokComma }
'select' { L.TokSelect }
int { L.TokConstInt $$ }
ident { L.TokIdentifier $$ }
ass { L.TokAssumption $$ }
%%
commaSep(p)
: commaSep_(p) { reverse $1 }
commaSep_(p)
: {- empty -} { [] }
| p { [$1] }
| commaSep_(p) ',' p { $3 : $1 }
algexpr :: { AlgExpr }
: 'select' assumption '[' expr ']' { Select $2 $4 }
assumption :: { Assumption }
: ass { read $1 }
expr :: { Expr }
: int { ConstInt $1 }
| ident '(' commaSep(expr) ')' { Function $1 $3 }
{
parseError :: L.Token -> L.Alex a
parseError tok = do
(L.AlexPn _ line column, _, _, _) <- L.alexGetInput
L.alexError $ "Parse error. Unexpected token " <> show tok <> " at line " <> show line <> ", column " <> show column
lexer :: (L.Token -> L.Alex a) -> L.Alex a
lexer = (=<< L.alexMonadScan)
}
This is the Alex lexer:
{
module Lexer (
Alex
, AlexPosn (..)
, alexGetInput
, alexError
, runAlex
, alexMonadScan
, Token (..)
) where
}
%wrapper "monad"
$digit = 0-9
$alpha = [a-zA-Z]
@ident = $alpha [$alpha $digit _]*
tokens :-
<0> $white+ { skip }
<0> "(" { tok TokLParen }
<0> ")" { tok TokRParen }
<0> "[" { tok TokLSquare }
<0> "]" { tok TokRSquare }
<0> "," { tok TokComma }
<0> select { tok TokSelect }
<0> MAX { tokS TokAssumption }
<0> MIN { tokS TokAssumption }
<0> TOP { tokS TokAssumption }
<0> $digit+ { tokInt }
<0> @ident { tokS TokIdentifier }
{
data Token
= TokLParen
| TokRParen
| TokLSquare
| TokRSquare
| TokComma
| TokAssumption String
| TokSelect
| TokConstInt Int
| TokIdentifier String
| EOF
deriving (Eq, Show)
alexEOF :: Alex Token
alexEOF = pure EOF
tok :: Token -> AlexAction Token
tok ctor _ _ = pure ctor
tokS :: (String -> Token) -> AlexAction Token
tokS f (_, _, _, str) len = pure $ f $ take len str
tokInt :: AlexAction Token
tokInt (_, _, _, str) len = pure $ TokConstInt $ read $ take len str
}
Data.hs
:
module Data( AlgExpr(..), Expr(..), Assumption(..) ) where
data AlgExpr = Select Assumption Expr
deriving (Eq,Show)
data Expr = ConstInt Int
| Function String [Expr]
deriving (Eq,Show)
data Assumption = MAX | MIN | TOP
deriving (Eq,Show,Read)
Upvotes: 2
Views: 93