Handle strings starting with whitespaces

Question

I'm trying to create an ANTLR v4 grammar with the following set of rules:

1.In case a line starts with @, it is considered a label:

@label

2.In case the line starts with cmd, it is treated as a command

cmd param1 param2

3.If a line starts with a whitespace, it is considered a string. All the text should be extracted. Strings can be multiline, so they end with an empty line

 A long string with multiline support
 and any special characters one can imagine.
<-empty line here->

4.Lastly, in case a line starts with anything but whitespace, @ and cmd, it's first word should be considered a heading.

Heading A long string with multiline support
 and any special characters one can imagine.
<-empty line here->

It was easy to handle lables and commands. But I am clueless about strings and headings. What is the best way to separate whitespace word whitespace whatever doubleNewline and whatever doubleNewline? I've seen a lot of samples with whitespaces, but none of them works with both random text and newlines. I don't expect you to write actual code for me. Suggesting an approach will do.

Bart Kiers · Accepted Answer

Something like this should do the trick:

lexer grammar DemoLexer;

LABEL
 : '@' [a-zA-Z]+
 ;

CMD
 : 'cmd' ~[
]+
 ;

STRING
 : ' ' .*? NL NL
 ;

HEADING
 : ( ~[@ 	
c] | 'c' ~'m' | 'cm' ~'d' ).*? NL NL
 ;

SPACE
 : [ 	
] -> skip
 ;

OTHER
 : .
 ;

fragment NL
 : '
'? '
'
 | '
'
 ;

This does not mandate the "beginning of the line" requirement. If that is something you want, you'll have to add semantic predicates to your grammar, which ties it to a target language. For Java, that would look like this:

LABEL
 : {getCharPositionInLine() == 0}? '@' [a-zA-Z]+
 ;

See:

Handle strings starting with whitespaces

Answers (1)

Related Questions