Ihor M.
Ihor M.

Reputation: 3148

ANTLR3 does not match extended ASCII characters

I'm using ANTLRWorks to test a grammar I came up with and one of the rules foresees usage of BULLET symbol •, but when parse tree is being built it escapes it every time. I also tried other chars from extended ASCII table and they are omitted as well. Is it a know bug or should I enable extended ASCII chars somehow?

Upvotes: 0

Views: 334

Answers (1)

Sam Harwell
Sam Harwell

Reputation: 100029

ANTLR 3.x through 4.0 can match any UTF-16 code unit except U+FFFF. ANTLR 4.1 will be able to match U+FFFF as well. To match characters in the range U+10000 to U+10FFFF, you'll need to explicitly encode them as UTF-16 surrogate pairs in your grammar.

Upvotes: 1

Related Questions