J.W.Sievers
J.W.Sievers

Reputation: 75

Gnu FLex: How does yyunput works

I have got a problem understanding flex yyunput behavior.

I want to put back some charackters

For exemple: My scanner found CALL{space}{cc}

cc          N?Z|N?C|P[OE]?|M

%%
CALL{blank}{cc}         {BEGIN CON; return yy::ez80asm_parser::make_CALL(loc);}
CALL{mmode}{blank}{cc}  {BEGIN CON; return  yy::ez80asm_parser::make_CALL(loc);}
CALL                    {BEGIN ARG; return  yy::ez80asm_parser::make_CALL(loc);}

and I want to give back the {cc} so it will be scanned next time.

What are the both arguments of yyunput has to be? I couldn't found any helpfully information about that funktion.

Any hints are wellcome Jürgen

Upvotes: 1

Views: 3178

Answers (1)

rici
rici

Reputation: 241771

You can't "give back the {cc}" because the regular expression doesn't have pieces. (Flex does not do captures, either, so it wouldn't help to put parentheses around it.)

If you just want to rescan part of a token, it is much better to use yyless than unput, since yyless mostly just changes a pointer. With a single call to yyless you can return as many characters as you like, so you only need to know how many characters to return. (More precisely, you tell it how many characters you want to keep in yytext; the remainder are returned and yytext is truncated accordingly.)

For reference, unput is a macro whose single argument is a single character which will be pushed onto the beginning of the unconsumed input, overwriting yytext as it goes. (In the C++ API, it calls the internal member function ::yyunput, supplying it an additional necessary argument. Don't call this function directly.)

If you need to push several characters onto the input, you need to unput them one at a time, starting with the last one. Since unput destroys the value of yytext, you need to make sure that you've already copied it if you need it before calling unput.

In your case, I think neither of these is appropriate. What you probably want to do is to not include the {cc} pattern in match in the first place, which you can do with flex's trailing context operator /. (That assumes that you don't need to include the characters matched by {cc} in the semantic value you will be returning; in the example provided, yytext does not appear to be part of the semantic value, so the assumption should be safe.) To do so, you might write something like:

CALL{mmode}?{blank}/{cc}  {BEGIN CON; return yy::ez80asm_parser::make_CALL(loc);}
CALL                      {BEGIN ARG; return yy::ez80asm_parser::make_CALL(loc);}

(Note: I combined your first two patterns into a single one since they seem to have the same action, but if you actually need the characters matched by {mmode} you might not want to do that.)

If that doesn't work, for whatever reason, use yyless. You'll need to know how many characters you want to return to the input, so I imagine you would end up with something like:

CALL{mmode}?{blank}{cc}  { BEGIN CON;
                           int to_keep = yyleng - 1;
                           switch (yytext[to_keep]) {
                             case 'C': case 'Z':
                               if (yytext[to_keep - 1] == 'N') --to_keep;
                               break;
                             case 'E': case 'O': --to_keep; break
                             case 'P': case 'N': break;
                             default: assert(false); /* internal error */
                           }
                           yyless(to_keep);
                           return yy::ez80asm_parser::make_CALL(loc);
                         }

For details on the trailing context operator, see the Flex manual section on patterns (search for the word "trailing"; there is an important note towards the end as well) as well as the first paragraph of the following chapter on matching. yyless and unput are both documented in the chapter on actions, which includes examples of their usage.

Upvotes: 2

Related Questions