Nate Glenn
Nate Glenn

Reputation: 6744

JavaCC quote with escape character

What is the usual way of tokenizing quoted strings that can contain an escape character? Here are some examples:

1) "this is good"
2) "this is\"good\""
3) "this \is good"
4) "this is bad\"
5) "this is \\"bad"
6) "this is bad
7)  this is bad"
8)  this is bad

Below is a sample parser that doesn't work quite right; it has expected results for all except examples 4 and 5, which parse successfully.

options
{
  LOOKAHEAD = 3;
  CHOICE_AMBIGUITY_CHECK = 2;
  OTHER_AMBIGUITY_CHECK = 1;
  STATIC = false;
  DEBUG_PARSER = false;
  DEBUG_LOOKAHEAD = false;
  DEBUG_TOKEN_MANAGER = true;
  ERROR_REPORTING = true;
  JAVA_UNICODE_ESCAPE = false;
  UNICODE_INPUT = false;
  IGNORE_CASE = false;
  USER_TOKEN_MANAGER = false;
  USER_CHAR_STREAM = false;
  BUILD_PARSER = true;
  BUILD_TOKEN_MANAGER = true;
  SANITY_CHECK = true;
  FORCE_LA_CHECK = true;
}

PARSER_BEGIN(MyParser)
import java.io.ByteArrayInputStream;
import java.io.UnsupportedEncodingException;
public class MyParser {
    public static void main(String[] args) throws UnsupportedEncodingException, ParseException{
        //note that this conversion to an input stream is only good for small strings
        MyParser parser = new MyParser(new ByteArrayInputStream(args[0].getBytes("UTF-8")));
        parser.enable_tracing();
        parser.myProduction();
        System.out.println("Must have worked!");
    }
}
PARSER_END(MyParser)

TOKEN:
{
<QUOTED: 
    "\"" 
    (
        "\\" ~[]    //any escaped character
        |           //or
        ~["\""]      //any non-quote character
    )* 
    "\""
>
}


void myProduction() :
{}
{
    <QUOTED>
    <EOF>
}

You can run MyParser from the command line with an input to parse. It will print "must have worked!" if it worked, or throw an error if it didn't.

How do I change this parser to correctly fail on examples 4 and 5?

Upvotes: 3

Views: 2737

Answers (1)

Theodore Norvell
Theodore Norvell

Reputation: 16231

To fix your regular expression, make it

TOKEN: {
<QUOTED: 
    "\"" 
    (
         "\\" ~[]     //any escaped character
    |                 //or
        ~["\"","\\"]  //any character except quote or backslash
    )* 
    "\"" > 
}

Upvotes: 16

Related Questions