End of line lex

I am writing an interpreter for assembly using lex and yacc. The problem is that I need to parse a word that will strictly be at the end of the file. I've read that there is an anchor $, which can help. However it doesn't work as I expected. I've wrote this in my lex file:

ABC$    {printf("QWERTY\n");}

The input file is:

ABC

without spaces or any other invisible symbols. So I expect the outputput to be QWERTY, however what I get is:

ABC

which I guess means that the program couldn't parse it. Then I thought, that $ might be a regular symbol in lex, so I changed the input file into this:

ABC$

So, if $ isn't a special symbol, then it will be parsed as a normal symbol, and the output will be QWERTY. This doesn't happen, the output is:

ABC$

The question is whether $ in lex is a normal symbol or special one.

Upvotes: 2

Views: 2928

Answers (1)

rici
rici

Reputation: 241721

In (f)lex, $ matches zero characters followed by a newline character.

That's different from many regex libraries where $ will match at the end of input. So if your file does not have a newline at the end, as your question indicates (assuming you consider newline to be an invisible character), it won't be matched.

As @sepp2k suggests in a comment, the pattern also won't be matched if the input file happens to use Windows line endings (which consist of the sequence \r\n), unless the generated flex file was compiled for Windows. So if you created the file on Windows and run the flex-generated scanner in a Unix environment, the \r will also cause the pattern to fail to match. In that case, you can use (f)lex's trailing context operator:

ABC/\r?\n   { puts("Matched ABC at the end of a line"); }

See the flex documentation for patterns for a full description of the trailing context operator. (Search for "trailing context" on that page; it's roughly halfway down.) $ is exactly equivalent to /\n.

That still won't match ABC at the very end of the file. Matching strings at the very end of the file is a bit tricky, but it can be done with two patterns if it's ok to recognise the string other than at the end of the file, triggering a different action:

ABC/.    { /* Do nothing. This ABC is not at the end of a line or the file */ }
ABC      { puts("ABC recognised at the end of a line"); }

That works because the first pattern will match as long as there is some non-newline character following ABC. (. matches any character other than a newline. See the above link for details.) If you also need to work with Windows line endings, you'll need to modify the trailing context in the first pattern.

Upvotes: 3

Related Questions