Reputation: 21
I am currently writing a parser with yecc in Erlang.
Nonterminals expression.
Terminals '{' '}' '+' '*' 'atom' 'app' 'integer' 'if0' 'fun' 'rec'.
Rootsymbol expression.
expression -> '{' '+' expression expression '}' : {'AddExpression', '$3','$4'}.
expression -> '{' 'if0' expression expression expression '}' : {'if0', '$3', '$4', '$5'}.
expression -> '{' '*' expression expression '}' : {'MultExpression', '$3','$4'}.
expression -> '{' 'app' expression expression '}' : {'AppExpression', '$3','$4'}.
expression -> '{' 'fun' '{' expression '}' expression '}': {'FunExpression', '$4', '$6'}.
expression -> '{' 'rec' '{' expression expression '}' expression '}' : {'RecExpression', '$4', '$5', '$7'}.
expression -> atom : '$1'.
expression -> integer : '$1'.
I also have an erlang project that tokenizes the the input before parsing:
tok(X) ->
element(2, erl_scan:string(X)).
get_Value(X)->
element(2, parse(tok(X))).
These cases are accepted:
interp:get_Value("{+ {+ 4 6} 6}").
interp:get_Value("{+ 4 2}").
These return: {'AddExpression' {'AddExpression' {integer, 1,6} {integer,1,6}}{integer,1,6}} and {'AddExpression' {integer,1,4} {integer,1,2}}
But this test case:
interp:get_Value("{if0 3 4 5}").
Returns:
{1,string_parser,["syntax error before: ","if0"]}
Upvotes: 2
Views: 348
Reputation: 20926
In the grammar rules what you are showing are the category of the terminal tokens and not their values. So you can match against an atom but not against a specific atom. If you are using the Erlang tokenizer then the token generated for "if0"
will be {atom,Line,if0}
while in you grammar you want a {if0,Line}
token. This is what the "Pre-processing" section of the yecc
documentation is trying to explain.
You will need a special tokenizer for this. A simple way of handling this if you want to use the Erlang tokenizer is have a pre-processing pass which scans the token list and converts {atom,Line,if0}
tokens to {if0,Line}
tokens.
Upvotes: 1