How to write an Antlr4 grammar that matches X number of characters

Question

I want to use Antlr4 to parse a format that stores the length of segments in the serialised form

For example, to parse: "6,Hello 5,World"

I tried to create a grammar like this

grammar myGrammar;

sequence:
 (LEN ',' TEXT)*;

LEN: [0-9]+;
TEXT: // I dont know what to put in here but it should match LEN number of chars

Is this even possible with Antlr?

A real world example of this would be parsing the messagePack binary format which has several types that serialise the length of the data into the serialised form.

For example there is the str8:

str 8 stores a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+========+
|  0xd9  |YYYYYYYY|  data  |
+--------+--------+========+

And str16 type

str16 stores a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+========+
|  0xda  |ZZZZZZZZ|ZZZZZZZZ|  data  |
+--------+--------+--------+========+

In these examples the first byte identifies the type, then we have 1 byte for str8 and 2 bytes for str16 which contain the length of the data. Then finally there is the data.

I think a rule might look something like this but dont know how to match the right amount of data

str8 : '\u00d9' BYTE DATA ;
str16: '\u00da' BYTE BYTE DATA ;

BYTE : '\u0000'..'\u00FF' ;
DATA : ???

How to write an Antlr4 grammar that matches X number of characters

Answers (1)

Related Questions