Koy
Koy

Reputation: 628

Expression grammar with different types of data

I'm writing an interpreter for a small language that deals with vectors. I'm using Flex and Bison.

Vectors are declared with the following notation:

v := <1.4, -2.2, 7>

So, their components can also include negative numbers. Supported operations on those vectors are addition, subtraction and scalar multiplication. So you cannot add a vector and scalar, you cannot subtract a vector and a scalar, but you can multiply a vector and a scalar.

Since negative numbers are supported, my lexical analyzer uses the following regular expression to match numbers:

[+-]?[0-9]+([.][0-9]+)? {yylval.somedub = atof(yytext); return NUMBER;}

The problem occurs when an expression of the following format is to be parsed (exactly the same problem with a plus, of course):

v-2*v

The way it should be parsed is: vector minus (number times vector). But, as soon as flex sees -2 it will interpret it as a number, so I get vector (number) times vector which, of course, makes no sense. On the other hand if you were to write v - 2*v it works fine, since there's a space between the - and the 2. The expression part of my grammar looks like this (I'm not going to copy the entire code because it's pretty big):

expression:
    expression '+' level_1
    | expression '-' level_1
    | level_1

level_1:
    NUMBER '*' level_1
    | level_1 '*' NUMBER
    | level_2

level_2:
    '(' expression ')'
    | vector //parses the whole <a, b, c, ..> notation, irrelevant for the problem

I've also declared +, - and * as left-associative using

%left '+' '-'
%left '*'

So how would I go about fixing this problem? I don't know whether I need to change the associativity somehow or maybe reconstruct the whole grammar.

Any ideas?

Thanks.

Upvotes: 1

Views: 72

Answers (1)

sepp2k
sepp2k

Reputation: 370455

If -2 is recognized as a single NUMBER token, that means the parser will see v-2 as a name followed by a number and at that point, there's not really anything you can do to get the parse you want. So instead -2 should be recognized as two tokens: a minus sign followed by a number.

To achieve this, you can simply remove [+-]? from your regex for numbers (I'm assuming you already have a rule to recognize + and - on their own).

Now you only need to adjust your grammar to allow - or + followed by a number (or by any expression if you also want to allow something like -v or -(2+3)).

Upvotes: 2

Related Questions