S Shruthi
S Shruthi

Reputation: 35

I need to parse a string using javacc containing single quotes as part of the string

I have defined grammar rules like

TOKEN : { < SINGLE_QUOTE : " ' " > }

TOKEN : {  < STRING_LITERAL : " ' "  (~["\n","\r"])*  " ' ">

But I am not able to parse sequences like 're'd' .I need the parser to parse re'd as a string literal.But the parser parses 're' seperately and 'd' seperately for these rules.

Upvotes: 0

Views: 1039

Answers (2)

sarath kumar
sarath kumar

Reputation: 380

If you need to lex re'd as STRING_LITERAL token then use the following rule

TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : {  < STRING_LITERAL : "'"?  (~["\n","\r"])*  "'"?>

I didn't see the rule for matching "re" separately.

In javacc, definition of your lexical specification STRING_LITERAL is to start with "'" single quot. But your input doesn't have the "'" at starting.

The "?" added in the STRING_LITERAL makes the single quot optional and if present only one. so this will match your input and lex as STRING_LITERAL.

JavaCC decision making rules:

1.) JavaCC will looks for the longest match. Here in this case even if the input starts with the "'" the possible matches are SINGLE_QUOTE and STRING_LITERAL. the second input character tells which token to choose STRING_LITERAL.

2.) JavaCC takes the the rule declared first in the grammar. Here if the input is only "'" then it will be lexed as SINGLE_QUOTE even if there is the possible two matches SINGLE_QUOTE and STRING_LITERAL.

Hope this will help you...

Upvotes: 2

Theodore Norvell
Theodore Norvell

Reputation: 16231

The following should work:

TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : {  < STRING_LITERAL : "'"  (~["\n","\r"])*  "'"> }

This is pretty much what you had, except that I removed some spaces.

Now if there are two on more apostrophes on a line (i.e. without an intervening newline or return) then the first and the last of those apostrophes together with all characters between should be lexed as one STRING_LITERAL token. That includes all intervening apostrophes. This is assuming there are no other rules involving apostrophes. For example, if your file is 're'd' that should lex as one token; likewise 'abc' + 'def' should lex as one token.

Upvotes: 2

Related Questions