Reputation: 1028
I'm using yecc to parse my tokenized asm-like code. After providing code like "MOV [1], [2]\nJMP hello"
and after lexer'ing, this is what I'm getting in response.
[{:opcode, 1, :MOV}, {:register, 1, 1}, {:",", 1}, {:register, 1, 2},
{:opcode, 2, :JMP}, {:identifer, 2, :hello}]
When I parse this I'm getting
[%{operation: [:MOV, [:REGISTER, 1], [:REGISTER, 2]]},
%{operation: [:JMP, [:CONST, :hello]]}]
But I want every operation to have line number in order to get meaningful errors further in code.
So I changed my parser to this:
Nonterminals
code statement operation value.
Terminals
label identifer integer ',' opcode register address address_in_register line_number.
Rootsymbol code.
code -> line_number statement : [{get_line('$1'), '$2'}].
code -> line_number statement code : [{get_line('$1'), '$2'} | '$3'].
%code -> statement : ['$1'].
%code -> statement code : ['$1' | '$2'].
statement -> label : #{'label' => label('$1')}.
statement -> operation : #{'operation' => '$1'}.
operation -> opcode value ',' value : [operation('$1'), '$2', '$4'].
operation -> opcode value : [operation('$1'), '$2'].
operation -> opcode identifer : [operation('$1'), value('$2')].
operation -> opcode : [operation('$1')].
value -> integer : value('$1').
value -> register : value('$1').
value -> address : value('$1').
value -> address_in_register : value('$1').
Erlang code.
get_line({_, Line, _}) -> Line.
operation({opcode, _, OpcodeName}) -> OpcodeName.
label({label, _, Value}) -> Value.
value({identifer, _, Value}) -> ['CONST', Value];
value({integer, _, Value}) -> ['CONST', Value];
value({register, _, Value}) -> ['REGISTER', Value];
value({address, _, Value}) -> ['ADDRESS', Value];
value({address_in_register, _, Value}) -> ['ADDRESS_IN_REGISTER', Value].
(commented code
is old, working rule)
Now I'm getting
{:error, {1, :assembler_parser, ['syntax error before: ', ['\'MOV\'']]}}
After providing same input. How to fix this?
Upvotes: 1
Views: 62
Reputation: 51409
My suggestion is to keep the line numbers in the tokens and not as separate tokens and then change how you build the operations.
So I would suggest this:
operation -> opcode value ',' value : [operation('$1'), line('$1'), '$2', '$4'].
operation -> opcode value : [operation('$1'), line('$1'), '$2'].
operation -> opcode identifer : [operation('$1'), line('$1'), value('$2')].
operation -> opcode : [operation('$1'), line('$1')].
line({_, Line, _}) -> Line.
Or even this if you want to mirror Elixir AST:
operation -> opcode value ',' value : {operation('$1'), meta('$1'), ['$2', '$4']}.
operation -> opcode value : {operation('$1'), meta('$1'), ['$2']}.
operation -> opcode identifer : {operation('$1'), meta('$1'), [value('$2')]}.
operation -> opcode : {operation('$1'), meta('$1'), []}.
meta({_, Line, _}) -> [{line, Line}].
Upvotes: 1