Reputation: 313
I looked ANSI C grammar (lex).
And this is octal digit regex
0{D}+{IS}? { count(); return(CONSTANT); }
My question is why do they accept something like 0898
?
It's not an octal digit.
So i thought they would consider that, but they just have wrote like that.
Could you explain why is that? Thank you
Upvotes: 3
Views: 304
Reputation: 120049
You want reasonable, user-friendly error messages.
If your lexer accepts 0999
, you can detect an illegal octal digit and output a reasonable message:
int x = 0999; ^ error: illegal octal digit, go back to school
If it doesn't, it will parse this as two separate tokens 0
and 999
and pass them to the parser. The resulting error messages could be quite confusing.
int x = 0999; ^ error: expected ‘,’ or ‘;’ before numeric constant
The invalid program is rejected either way, as it should, however the ostensibly incorrect lex grammar does a better job with error reporting.
This demonstrates that practical grammars built for tools such as lex
or yacc
do not have to correspond exactly to ideal grammars found in language definitions.
Upvotes: 4
Reputation: 134028
The grammar you repeatedly link to in your questions was produced in 1985, 4 years prior to the publication of the first C standard revision in 1989.
That is not the grammar that was published in the standard of 1989, which clearly uses
octal-constant:
0
octal-constant octal-digit
octal-digit: one of
0
1
2
3
4
5
6
7
Even then, that Lex grammar is sufficient for tokenizing a valid program.
Upvotes: 3
Reputation: 26753
Keep in mind that this is only syntax, not semantic.
So it is sufficient to detect "Cannot be anything but a constant.".
It is not necessary (yet) to detect "A correct octal constant.".
Note that it does not even make a difference between octal, decimal, hexadecimal. All of them register as "CONSTANT".
Upvotes: 3