boyman
boyman

Reputation: 1

regular expression in lex

I want to print only when the whole line matches the pattern. But that's if even part of the line matches the pattern, it prints out. How should I use regular expressions?

Upvotes: 0

Views: 1168

Answers (2)

mevets
mevets

Reputation: 10445

The default rule in lex is to print the output; so if you add a line:

. { }

At the end, you will prevent it from echoing the unwashed patterns. Next up, if you want your pattern to be limited to a single line; you need to include the newline in your rule:

((100+1+)|(01))+\n {printf("%s\n",yytext);}

But notice I made an assumption where your newline should go; it could just as well have been:

((100+1+)|(01))\n+ {printf("%s\n",yytext);}

For an entirely different effect.

Lex is a sharp tool.

Upvotes: 0

rici
rici

Reputation: 241721

You need to do two things:

  1. Insist that the pattern matches up to the end of the line;

     ((100+1+)|(01))+\n      {printf("%s\n",yytext);}
    

    (\n matches the end-of line character.)

  2. Include an alternative pattern to catch the lines not matched by the first pattern:

     .*\n?     { /* Maybe do something here */
    

You need to put these two rules in that order, because the second one will match any line at all, correct or not. However, if the first pattern matches the same line, it is the one which will be used.

The ? at the end of the second rule is to make the newline character optional. In (f)lex, . matches any character other than a newline, so you might think that .*\n will match any line. And indeed it will. However, it is possible (though not strictly correct) for the last line in a text file to be missing the newline terminator. To cover that case we use .*\n?. (F)lex rules never match the empty string, and patterns always match as much as possible, so the only time that rule can match without a newline is if the characters to be matched are exactly at the end of the file, without a newline.

Note the difference between .*\n and .*$. If a pattern ends with $, (f)lex will only use the rule of the next character is a newline. But $ does not match the newline, so it will still be in the input stream waiting to be matched. If you used $ instead of \n, you would need another rule to match (and discard) newline characters. But that might be what you want after all, because flex always reads one character ahead, even if it doesn't need the next character to know what to do. So if you explicitly match the \n with rules like the ones I suggested above, you will find that your scanner isn't very responsive in interactive use; messages will be delayed until the next line is read.

Upvotes: 1

Related Questions