Reputation: 2085
I'm trying to get a match to work for the following rule which will match operator identifiers, but it's not having it, specifically on the lines that match ==
and ..
:
Symbol
: ( U_Sm
| U_So
| U_Sc
| '\u0080' .. '\u007F'
| '==' '='*
| '..' '.'*
)+
;
The rules with U_
prefixes refer to unicode group lexer fragments. I have removed the =
character from the U_Sm
fragment, so that shouldn't be an issue.
Valid identifiers could be the following:
==
!==
<<~
..
!..
<...
±϶⇎
Invalid identifiers would be:
.
(Member access is done separately)=
(Assignment is done separately)!.
(Single dots or single equals signs in an identifier are disallowed)As you can see, the rule can include two or more equals signs or full stops in an identifier (e.g. !..
, <...>
, !==
) or as the whole identifier (e.g. ..
, ==
, ===
), but not one.
The errors and warnings given by the compiler are the following:
warning(200): Hydra.g3:6:18: Decision can match input such as "'='<EOT>" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(200): Hydra.g3:7:18: Decision can match input such as "'.'<EOT>" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
error(201): Hydra.g3:8:9: The following alternatives can never be matched: 4
A test grammar to reproduce the errors:
grammar Test;
program :
Symbol
;
Symbol
: ( U_Sm
| U_So
| U_Sc
| '\u0080' .. '\u007F'
| '==' '='*
| '..' '.'*
)+
;
fragment U_Sm
: '\u002B'
| '\u003C' | '\u003E'
| '\u007C' | '\u007E'
| '\u00AC' | '\u00B1'
| '\u00D7' | '\u00F7'
| '\u03F6'
| '\u0606' .. '\u0608'
| '\u2044' | '\u2052'
| '\u207A' .. '\u207C'
| '\u208A' .. '\u208C'
| '\u2140' .. '\u2144'
| '\u214B'
| '\u2190' .. '\u2194'
| '\u219A' | '\u219B'
| '\u21A0' | '\u21A3'
| '\u21A6' | '\u21AE'
| '\u21CE' | '\u21CF'
| '\u21D2' | '\u21D4'
| '\u21F4' .. '\u22FF'
| '\u2308' .. '\u230B'
| '\u2320' | '\u2321'
| '\u237C'
| '\u239B' .. '\u23B3'
| '\u23DC' .. '\u23E1'
| '\u25B7' | '\u25C1'
| '\u25F8' .. '\u25FF'
| '\u266F'
| '\u27C0' .. '\u27C4'
| '\u27C7' .. '\u27CA'
| '\u27CC'
| '\u27D0' .. '\u27E5'
| '\u27F0' .. '\u27FF'
| '\u2900' .. '\u2982'
| '\u2999' .. '\u29D7'
| '\u29DC' .. '\u29FB'
| '\u29FE' .. '\u2AFF'
| '\u2B30' .. '\u2B44'
| '\u2B47' .. '\u2B4C'
| '\uFB29' | '\uFE62'
| '\uFE64' .. '\uFE66'
| '\uFF0B'
| '\uFF1C' .. '\uFF1E'
| '\uFF5C' | '\uFF5E'
| '\uFFE2'
| '\uFFE9' .. '\uFFEC'
;
fragment U_So
: '\u00A6' | '\u00A7'
| '\u00A9' | '\u00AE'
| '\u00B0' | '\u00B6'
| '\u0482' | '\u060E'
| '\u060F' | '\u06E9'
| '\u06FD' | '\u06FE'
| '\u07F6' | '\u09FA'
| '\u0B70'
| '\u0BF3' .. '\u0BF8'
| '\u0BFA' | '\u0C7F'
| '\u0CF1' | '\u0CF2'
| '\u0D79'
| '\u0F01' .. '\u0F03'
| '\u0F13' .. '\u0F17'
| '\u0F1A' .. '\u0F1F'
| '\u0F34' | '\u0F36'
| '\u0F38'
| '\u0FBE' .. '\u0FC5'
| '\u0FC7' .. '\u0FCC'
| '\u0FCE' | '\u0FCF'
| '\u0FD5' .. '\u0FD8'
| '\u109E' | '\u109F'
| '\u1360'
| '\u1390' .. '\u1399'
| '\u1940'
| '\u19E0' .. '\u19FF'
| '\u1B61' .. '\u1B6A'
| '\u1B74' .. '\u1B7C'
| '\u2100' | '\u2101'
| '\u2103' .. '\u2106'
| '\u2108' | '\u2109'
| '\u2114'
| '\u2116' .. '\u2118'
| '\u211E' .. '\u2123'
| '\u2125' | '\u2127'
| '\u2129' | '\u212E'
| '\u213A' | '\u213B'
| '\u214A' | '\u214C'
| '\u214D' | '\u214F'
| '\u2195' .. '\u2199'
| '\u219C' .. '\u219F'
| '\u21A1' | '\u21A2'
| '\u21A4' | '\u21A5'
| '\u21A7' .. '\u21AD'
| '\u21AF' .. '\u21CD'
| '\u21D0' | '\u21D1'
| '\u21D3'
| '\u21D5' .. '\u21F3'
| '\u2300' .. '\u2307'
| '\u230C' .. '\u231F'
| '\u2322' .. '\u2328'
| '\u232B' .. '\u237B'
| '\u237D' .. '\u239A'
| '\u23B4' .. '\u23DB'
| '\u23E2' .. '\u23E8'
| '\u2400' .. '\u2426'
| '\u2440' .. '\u244A'
| '\u249C' .. '\u24E9'
| '\u2500' .. '\u25B6'
| '\u25B8' .. '\u25C0'
| '\u25C2' .. '\u25F7'
| '\u2600' .. '\u266E'
| '\u2670' .. '\u26CD'
| '\u26CF' .. '\u26E1'
| '\u26E3'
| '\u26E8' .. '\u26FF'
| '\u2701' .. '\u2704'
| '\u2706' .. '\u2709'
| '\u270C' .. '\u2727'
| '\u2729' .. '\u274B'
| '\u274D'
| '\u274F' .. '\u2752'
| '\u2756' .. '\u275E'
| '\u2761' .. '\u2767'
| '\u2794'
| '\u2798' .. '\u27AF'
| '\u27B1' .. '\u27BE'
| '\u2800' .. '\u28FF'
| '\u2B00' .. '\u2B2F'
| '\u2B45' | '\u2B46'
| '\u2B50' .. '\u2B59'
| '\u2CE5' .. '\u2CEA'
| '\u2E80' .. '\u2E99'
| '\u2E9B' .. '\u2EF3'
| '\u2F00' .. '\u2FD5'
| '\u2FF0' .. '\u2FFB'
| '\u3004' | '\u3012'
| '\u3013' | '\u3020'
| '\u3036' | '\u3037'
| '\u303E' | '\u303F'
| '\u3190' | '\u3191'
| '\u3196' .. '\u319F'
| '\u31C0' .. '\u31E3'
| '\u3200' .. '\u321E'
| '\u322A' .. '\u3250'
| '\u3260' .. '\u327F'
| '\u328A' .. '\u32B0'
| '\u32C0' .. '\u32FE'
| '\u3300' .. '\u33FF'
| '\u4DC0' .. '\u4DFF'
| '\uA490' .. '\uA4C6'
| '\uA828' .. '\uA82B'
| '\uA836' | '\uA837'
| '\uA839'
| '\uAA77' .. '\uAA79'
| '\uFDFD' | '\uFFE4'
| '\uFFE8' | '\uFFED'
| '\uFFEE' | '\uFFFC'
| '\uFFFD'
;
fragment U_Sc
: '\u0024'
| '\u00A2' .. '\u00A5'
| '\u060B' | '\u09F2'
| '\u09F3' | '\u09FB'
| '\u0AF1' | '\u0BF9'
| '\u0E3F' | '\u17DB'
| '\u20A0' .. '\u20B8'
| '\uA838' | '\uFDFC'
| '\uFE69' | '\uFF04'
| '\uFFE0' | '\uFFE1'
| '\uFFE5' | '\uFFE6'
;
Upvotes: 1
Views: 516
Reputation: 170148
The range '\u0080' .. '\u007F'
is invalid since 0x80
is larger than 0x7F
.
It seems ANTLR has a problem with your nested repetition: ( ... ( ... )+ ... )+
. Even though ANTLR's +
and *
are greedy by default (except for .*
and .+
), it appears that in such nested repetitions you need to explicitly tell ANTLR to either match ungreedy or greedy (greedy in your case).
The following rule does not produce any errors:
Symbol
: ( U_Sm
| U_So
| U_Sc
| '\u007F' .. '\u0080'
| '==' (options{greedy=true;}: '=')*
| '..' (options{greedy=true;}: '.')*
)+
;
Upvotes: 1