TYFA
TYFA

Reputation: 313

Octal digit in ANSI C grammar (lex)

I looked ANSI C grammar (lex).

And this is octal digit regex

0{D}+{IS}?      { count(); return(CONSTANT); }

My question is why do they accept something like 0898?

It's not an octal digit.

So i thought they would consider that, but they just have wrote like that.

Could you explain why is that? Thank you

Upvotes: 3

Views: 304

Answers (3)

n. m. could be an AI
n. m. could be an AI

Reputation: 120049

You want reasonable, user-friendly error messages.

If your lexer accepts 0999, you can detect an illegal octal digit and output a reasonable message:

 int x = 0999;
          ^
 error: illegal octal digit, go back to school

If it doesn't, it will parse this as two separate tokens 0 and 999 and pass them to the parser. The resulting error messages could be quite confusing.

 int x = 0999;
          ^
 error: expected ‘,’ or ‘;’ before numeric constant

The invalid program is rejected either way, as it should, however the ostensibly incorrect lex grammar does a better job with error reporting.

This demonstrates that practical grammars built for tools such as lex or yacc do not have to correspond exactly to ideal grammars found in language definitions.

Upvotes: 4

The grammar you repeatedly link to in your questions was produced in 1985, 4 years prior to the publication of the first C standard revision in 1989.

That is not the grammar that was published in the standard of 1989, which clearly uses

octal-constant:

  • 0

  • octal-constant octal-digit

octal-digit: one of

  • 0 1 2 3 4 5 6 7

Even then, that Lex grammar is sufficient for tokenizing a valid program.

Upvotes: 3

Yunnosch
Yunnosch

Reputation: 26753

Keep in mind that this is only syntax, not semantic.
So it is sufficient to detect "Cannot be anything but a constant.".
It is not necessary (yet) to detect "A correct octal constant.".

Note that it does not even make a difference between octal, decimal, hexadecimal. All of them register as "CONSTANT".

Upvotes: 3

Related Questions