Martin
Martin

Reputation: 1

Mixing two languages

I am writing a grammar for a small meta language. That language should include code blocks of another language (e.g., JavaScript, C, or the like). I would like to treat these code blocks just a plain strings that are print out unchanged. My language is C/Java syntax based using { } for code blocks. But I would also like to use { } for the code blocks of the embedded language. Here some example code:

// my language
modul Abc {
   input x: string;
   otherLang {
      // this is now a code block from the second
      // language, which I do not want to analyze
      // It might itself contain { } like
      if (something) {
          abc = "string";
      }
   }
}

How would I resuse { and } for those different uses without mixing them up with the ones from an embedded language?

Upvotes: 0

Views: 113

Answers (1)

GRosenberg
GRosenberg

Reputation: 6001

An interesting way to do this is to use mode recursion. ANTLR internally maintains a mode stack.

Although a bit verbose, the recursed mode offers the possibility of handling things -- like comments and escaped chars -- that could otherwise throw off the nesting.

One thing to be aware of is that rules with more attributes concatenate their matched content into the token produced by the first following non-moreed rule. The following example uses the virtual token OTHER_END to provide semantic clarity and preclude confusion with otherwise being a RPAREN token.

tokens {
    OTHER_END
}

otherLang : OTHER_BEG OTHER_END+ ; // multiple 'end's dependent on nesting

OTHER_BEG : 'otherLang' LPAREN -> pushMode(Other) ;
LPAREN    : LParen ;
RPAREN    : RParen ;
WS        : [ \t\r\n] -> skip;

mode Other ;
    // handle special cases here 
    O_RPAREN : RParen -> type(OTHER_END), popMode() ;
    O_LPAREN : LParen -> more, pushMode(Other) ;
    O_STUFF  : .      -> more ;

fragment LParen : '{' ;
fragment RParen : '}' ;

Upvotes: 1

Related Questions