Scooteroy
Scooteroy

Reputation: 1

Antlr4 lexer seems to have a problem processing token 'AX', and no semantic predicate runs on rule REG

In the following example, the input token 'AX' seems to cause errors for an unknown reason. The parse tree shows that other rule matches that contain register tokens such as 'DX' are working fine. I've tried making token 'AX' in the REG rule the second alternative instead of the first and still encountered the same parser output. I've also tried adding spaces to the input operand tokens to see if the parser was just having trouble separating the comma. I've noticed in the console output that the semantic predicate for the REG rule isn't running. I'm only using semantic predicates for verifying which lexer rule was matched.

Grammar:

grammar asm8086;

prog: line* EOF ;

line
    : (constant | instruction) NEWLINE
    | NEWLINE //empty line
    ;

constant
    : WHOLE_NUM #decimal
    | HEX_NUM   #hexadecimal
    | BIN_NUM   #binary  //binary constants have optional underscore separator for readability
    ;

instruction
    : 'DAA' //grp0
    | 'DAS'
    | 'AAA'
    | 'AAS'
    | ('NOP' | 'XCHG' 'AX,' 'AX')
    | 'CBW'
    | 'CWD' //confirm no accessory symbols hereon for no-operand group
    | 'WAIT'
    | 'PUSHF'
    | 'POPF'
    | 'SAHF'
    | 'LAHF'
    | OP_PRE? ('CMPS' |'CMPSB' | 'CMPSW' | 'LODS' |'LODSB' | 'LODSW' | 'MOVS' | 'MOVSB' | 'MOVSW') pointer?
    | OP_PRE? ('SCAS' |'SCASB' | 'SCASW' | 'STOS' |'STOSB' | 'STOSW') //no override
    | 'INTO'
    | 'IRET'
    | OP_PRE? 'XLAT' pointer? //(SEG_PRE '[BX' '+' 'AL]')?
    | 'HLT'
    | 'CMC'
    | 'CLC'
    | 'STC'
    | 'CLI'
    | 'STI'
    | 'CLD'
    | 'STD'
    | 'ADD' ((REG ',' constant) | (pointer ',' constant) | (REG ',' REG) | (pointer ',' REG)) //grp1
    | ('JNZ' | 'JMP') constant
    ;

pointer
    : SEG_PRE? '[' REG ('+' REG)? ']'
    | SEG_PRE? '[' REG ('+' REG)? ']' '+' (WHOLE_NUM | HEX_NUM)
    ;

//argument tokens
SEG_PRE
    : 'ES:' 
    | 'CS:'
    | 'SS:'
    | 'DS:'
    {System.out.print(getText()); System.out.println(" is a seg prefix");};

OP_PRE //operation or mnemonic prefix
    : ('REP' | 'REPE' | 'REPZ')
    | ('REPNZ' | 'REPNE')
    | 'LOCK'
    {System.out.print(getText()); System.out.println(" is an op prefix");};

REG
    : 'AX' | 'BX' | 'CX' | 'DX' //general word registers
    | 'AH' | 'AL' | 'BH' | 'BL' | 'CH' | 'CL' | 'DH' | 'DL' //general byte registers
    | 'SP' | 'BP' | 'SI' | 'DI' //stack and source/destination registers
    | 'ES' | 'CS' | 'SS' | 'DS' //segment registers
    {System.out.print(getText()); System.out.println(" is a reg");};

//constant tokens
HEX_NUM: '0x' HEX+ {System.out.print(getText()); System.out.println(" is a hex");};
fragment
HEX: [0-9a-fA-F] ;

BIN_NUM: '0b' BIN+ {System.out.print(getText()); System.out.println(" is a bin");};
fragment
BIN: ('0' | '1' | '_') ;

WHOLE_NUM: '0d'? WHOLE+ {System.out.print(getText()); System.out.println(" is a whole");};
fragment
WHOLE: [0-9] ;


//skipped tokens
COMMENT: ';' ~[\r\n]* -> skip ;

WS: [ \t]+ -> skip ;

NEWLINE: '\r\n' ;

Input:

REPNZ MOVSB SS:[AX + BX]
REPNZ MOVSB DS:[DX + BX]
STI
ADD AH, BX
ADD AX, BX
ADD DS:[CX], 23
ADD DS:[CX + BX], 23
ADD SS:[CX + BX], 23
ADD SS:[AX + BX], 23
ADD AH, 0b1100
ADD CX, 0xFFFF

Console output:

DS: is a seg prefix
DS: is a seg prefix
23 is a whole
DS: is a seg prefix
23 is a whole
23 is a whole
23 is a whole
0b1100 is a bin
0xFFFF is a hex
line 1:16 no viable alternative at input 'SS:[AX'
line 5:4 extraneous input 'AX,' expecting {'[', SEG_PRE, REG}
line 5:10 no viable alternative at input 'BX\r\n'
line 9:8 no viable alternative at input 'SS:[AX'

Parse tree: enter image description here

Upvotes: 0

Views: 23

Answers (1)

Scooteroy
Scooteroy

Reputation: 1

To summarize the answer, the parser rule ('NOP' | 'XCHG' 'AX,' 'AX') was trying to match 'AX' which was already being handled by the lexer under the REG rule. The solution was to remove the duplicate string literal tokens 'AX' from the parser rule.

Upvotes: 0

Related Questions