Noor
Noor

Reputation: 20150

Define token to match any string

I am new to javacc. I am trying to define a token which can match any string. I am following the regex syntax <ANY: (~[])+> which is not working. I want to achieve something very simple, define an expression having the following BNF:

<exp> ::= "path(" <string> "," <number> ")"

My current .jj file is as follows, any help on how I can parse the string:

options
{
}
PARSER_BEGIN(SimpleAdd)
package SimpleAddTest;
public class SimpleAdd
{
}
PARSER_END(SimpleAdd)
SKIP :
{
    " "
|   "\r"
|   "\t"
|   "\n"
}
TOKEN:
{
    < NUMBER: (["0"-"9"])+  > |
    <PATH: "path"> |
    <RPAR: "("> |
    <LPAR: ")"> |
    <QUOTE: "'"> |
    <COMMA: ","> |
    <ANY: (~[])+>


}

int expr():
{
    String leftValue ;
    int rightValue ;
}
{

        <PATH> <RPAR> <QUOTE> leftValue = str() <QUOTE> <COMMA> rightValue = num() <LPAR>
    { return 0; }
}

String str():
{
    Token t;
}
{

    t = <ANY> { return t.toString(); }
}

int num():
{
    Token t;
}
{
    t = <NUMBER> { return Integer.parseInt(t.toString()); }
}

The error I am getting with the above javacc file is:

Exception in thread "main" SimpleAddTest.ParseException: Encountered " <ANY> "path(\'5\',1) "" at line 1, column 1.
Was expecting:
    "path" ...

Upvotes: 3

Views: 2006

Answers (1)

Theodore Norvell
Theodore Norvell

Reputation: 16231

The pattern <ANY: (~[])+> will indeed match any nonempty string. The issue is that this is not what you really want. If you have a rule <ANY: (~[])+>, it will match the whole file, unless the file is empty. In most cases, because of the longest match rule, the whole file will be parsed as [ANY, EOF]. Is that really what you want? Probably not.

So I'm going to guess at what you really want. I'll guess you want any string that doesn't include a double quote character. Maybe there are other restrictions, such as no nonprinting characters. Maybe you want to allow double quotes if the are preceded by a backslash. Who knows? Adjust as needed.

Here is what you can do. First, replace the token definitions with

TOKEN:
{
    < NUMBER: (["0"-"9"])+  > |
    <PATH: "path"> |
    <RPAR: "("> |
    <LPAR: ")"> |
    <COMMA: ","> |
    <STRING: "\"" (~["\""])* "\"" >
}

Then change your grammar to

int expr():
{
    String leftValue ;
    int rightValue ;
}
{    
        <PATH> <RPAR> leftValue=str() <COMMA> rightValue = num() <LPAR>
    { return 0; }
}

String str():
{
    Token t;
    int len ;
}
{    
    t = <String>
    { len = t.image.length() ; }
    { return t.image.substring(1,len-1); }
}

Upvotes: 5

Related Questions