kkm mistrusts SE
kkm mistrusts SE

Reputation: 5510

Can a flex/bison parser accept at most one token from a set in any order

In my language, at a specific point I need to accept at most one token from a set of tokens. To give an example, a parenthesized expression may be followed by at most one of !^& in any order, so the following 2 lines should be equivalent

(foo)!^
(foo)^!

and the following one is illegal (a token repeats twice)

(foo)^!^

Is this doable, short of exhausting all possibilities with CFG rules, naturally? Either lexical (flex) or syntactical (bison) level will do.

Upvotes: 0

Views: 423

Answers (1)

rici
rici

Reputation: 241671

There is no way of doing this with either a regular expression or a CFG other than enumerating all the possibilities, which is a factorial amount of work. (By grouping, the actual size can be reduced, but it is still exponential.) If you only have one instance, and there are only three tokens, then listing is probably the easiest solution.

But if there are various tokens, and you might want to expand the list in the future, it's probably easier to allow all combination of tokens, but associate a bitmap with the token-list to make it easy to check for duplication, which presumably should result in an error message.

Here's a simple flex solution for the precise case you mention. (In my original, I duplicated a lot of code, but I think the following is easier to read.) The <MODS> start condition is triggered by the first appearance of [&^!] and serves to absorb the rest of them; when any other character is encountered, it is marked to be rescanned (yyless(0)) and the current mask of modifiers is returned.

%{
  // The MODS token has %type <ModMask>, and the associated
  // semantic value will be the union of the enum bits.
  typedef unsigned int ModMask;
  enum { MOD_HAT=1, MOD_BANG=2, MOD_AMP=4 };
  // This function maps a character to a modmask value.
  // A real implementation would use a lookup table. I just included
  // this definition so the snippet is usable.
  ModMask tokenToMask(unsigned char c) {
    return c == '^' ? MOD_HAT : 
           c == '!' ? MOD_BANG :
           c == '&' ? MOD_AMP : 0;
%}

%x SC_MODS

%%

[&^!]       { yylval.modmask = tokenToMask(yytext[0]; BEGIN(MODS); }
<MODS>[&^!] { ModMask m = tokenToMask(yytext[0];
              if (yylval.modmask & m) {
                yyerror("Duplicate modifier");
              }
              yylval.modmask |= m;
            }
<MODS>.|\n  { yyless(0); BEGIN(INITIAL); return MODS; }

Upvotes: 1

Related Questions