antlr4, trivial grammar, token recognition errors

As a complete beginner in antlr4, I haven't been able to make any use of the answer to a similar question. It looks to me that fragments are only called in my grammar by terminal rules, but still the parser is throwing the following error when submitted the string "myIdentifier":

line 1:0 token recognition error at: 'm'
line 1:1 token recognition error at: 'y'
line 1:2 token recognition error at: 'I'
line 1:3 token recognition error at: 'd'
line 1:4 token recognition error at: 'e'
line 1:5 token recognition error at: 'n'
line 1:6 token recognition error at: 't'
line 1:7 token recognition error at: 'i'
line 1:8 token recognition error at: 'f'
line 1:9 token recognition error at: 'i'
line 1:10 token recognition error at: 'e'
line 1:11 token recognition error at: 'r'

My grammar is this:

grammar Sable;

options {

}

@header {
    package org.sable.parser.gen;
}

IDENTIFIER:
    (IdentifierHead IdentifierCharacter*)
    | ('`'(IdentifierHead IdentifierCharacter*)'`')
    ;

WS  :  [ \u0020\u000C\u000A\u000D\u0009u000B\u000C]+ -> skip
    ;

COMMENT
    :   '/*' .*? '*/' -> channel(HIDDEN)
    ;

LINE_COMMENT
    :   '//' ~[\u000A\u000D]* -> channel(HIDDEN)
    ;




// NOTE: a file with zero statements is allowed because
// it can contain just comments.
sourceFile:
    statement* EOF;

statement:
    expression ';'?;

// Req. not existing any valid expression starting from
// an equals sign or any other assignment operator.
expression:
    valuedExpression (assignmentOperator valuedExpression)?;

valuedExpression:
    IDENTIFIER
    ;

assignmentOperator:
    '='
    | '*='
    | '/='
    | '%='
    | '+='
    | '-='
    | '<<='
    | '>>='
    | '&='
    | '^='
    | '|='
    ;

fragment DecimalDigit:
    '0'..'9'
    ;

fragment IdentifierHead:
    'a'..'z'
    | 'A'..'Z'
    | '_'
    | '\u00A8'
    | '\u00AA'
    | '\u00AD'
    | '\u00AF' |
    '\u00B2'..'\u00B5' |
    '\u00B7'..'\u00BA'  |
    '\u00BC'..'\u00BE' |
    '\u00C0'..'\u00D6' |
    '\u00D8'..'\u00F6' |
    '\u00F8'..'\u00FF' |
    '\u0100'..'\u02FF' |
    '\u0370'..'\u167F' |
    '\u1681'..'\u180D' |
    '\u180F'..'\u1DBF' |
    '\u1E00'..'\u1FFF' |
    '\u200B'..'\u200D' |
    '\u202A'..'\u202E' |
    '\u203F'..'\u2040' |
    '\u2054' |
    '\u2060'..'\u206F' |
    '\u2070'..'\u20CF' |
    '\u2100'..'\u218F' |
    '\u2460'..'\u24FF' |
    '\u2776'..'\u2793' |
    '\u2C00'..'\u2DFF' |
    '\u2E80'..'\u2FFF' |
    '\u3004'..'\u3007' |
    '\u3021'..'\u302F' |
    '\u3031'..'\u303F' |
    '\u3040'..'\uD7FF' |
    '\uF900'..'\uFD3D' |
    '\uFD40'..'\uFDCF' |
    '\uFDF0'..'\uFE1F' |
    '\uFE30'..'\uFE44' |
    '\uFE47'..'\uFFFD'
    ;
fragment IdentifierCharacter:
    DecimalDigit
    | '\u0300'..'\u036F'
    | '\u1DC0'..'\u1DFF'
    | '\u20D0'..'\u20FF'
    | '\uFE20'..'\uFE2F'
    | IdentifierHead
    ;

What am I doing wrongly? My assumptions are:

Upvotes: 0

Views: 782

Answers (1)

On the basis of Bart Kiers' comment:

When I test your grammar, "myIdentifier" is tokenised as a IDENTIFIER. Perhaps you need to regenerate your lexer/parser? With and without the empty options { } block it works just fine.

it turned out that the problem was in my environment rather than in my grammar. I was using Certiv's antlr4 support plugins for Eclipse to generate my grammar. As soon as I began to generate my grammar using antlr4 from the command line, the errors disappeared.

Upvotes: 1

Related Questions