6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"ANTLR4. How to create properly unicode range lexer rules?\",\"text\":\"

In my grammar I'd like variables to be comprised of latin, cyrillic and mandarin characters. \\nFor this purposes I define lexer rule, like this:\\nCYRILLIC_RANGE: [\\\\u0400–\\\\u04FF];
\\nthis is what I see in my ANTLRWorks 2.1 output when I try to run expression against my query:\\nline 1:4 token recognition error at: 'н'\\nWhat am I missing?

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"Ihor M.\"},\"upvoteCount\":0,\"answerCount\":1,\"acceptedAnswer\":null}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","antlr4",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/antlr4/1","children":"antlr4"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/18bec43376eb20e4b27d334b4da4ccc8?s=256&d=identicon&r=PG","alt":"Ihor M.","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/1048185/ihor-m","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Ihor M."}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",3148]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"ANTLR4. How to create properly unicode range lexer rules?"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

In my grammar I'd like variables to be comprised of latin, cyrillic and mandarin characters. \nFor this purposes I define lexer rule, like this:\nCYRILLIC_RANGE: [\\u0400–\\u04FF];
\nthis is what I see in my ANTLRWorks 2.1 output when I try to run expression against my query:\nline 1:4 token recognition error at: 'н'\nWhat am I missing?

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",0]}],["$","p",null,{"children":["Views: ",1463]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",1,")"]}],[["$","div","20104116",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/8fcb4f47c72791dd9e567cec85c762e3?s=256&d=identicon&r=PG","alt":"Dan McGee","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/175045/dan-mcgee","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Dan McGee"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",171]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

I'm not sure what you are missing, as this seems to be working for me here. Have you tried the other range syntax? Both of these should be equivalent.

\n\n

CYRILLIC_RANGE : [\\u0400-\\u04FF] ;\nCYRILLIC_RANGE : '\\u0400'..'\\u04FF' ;\n

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",2]}]}]]}]]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","28126507",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/28126507","className":"text-blue-600 hover:underline","children":"ANTLR4: Using non-ASCII characters in token rules"}]}],["$","li","68364856",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/68364856","className":"text-blue-600 hover:underline","children":"Include certain escapement symbols into ANTLR Lexer rules"}]}],["$","li","49187813",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/49187813","className":"text-blue-600 hover:underline","children":"Exclude chars from range in ANTLR lexer"}]}],["$","li","27541957",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/27541957","className":"text-blue-600 hover:underline","children":"ANTLR4 lexer rules don't work as expected"}]}],["$","li","41222054",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/41222054","className":"text-blue-600 hover:underline","children":"Digit ranges in Antlr4? Should lexer rules be unambiguous in Antlr4?"}]}],["$","li","31496503",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/31496503","className":"text-blue-600 hover:underline","children":"antlr4: need to convert sequences of symbols to characters in lexer"}]}],["$","li","27517033",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/27517033","className":"text-blue-600 hover:underline","children":"How to describe String that contains characters with range counts under ANTLR4 lexer rules?"}]}],["$","li","10362300",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/10362300","className":"text-blue-600 hover:underline","children":"Special character handling in ANTLR lexer"}]}],["$","li","7060904",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/7060904","className":"text-blue-600 hover:underline","children":"Antlr Lexer rules"}]}],["$","li","7539110",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/7539110","className":"text-blue-600 hover:underline","children":"Characters Matching Multiple Lexer Rules in ANTLR"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

ANTLR4. How to create properly unicode range lexer rules?

Answers (1)

Related Questions