Solezano
Solezano

Reputation: 15

Rules & Actions for Parser Generator, and

I am trying to wrap my head around an assignment question, therefore I would very highly appreciate any help in the right direction (and not necessarily a complete answer). I am being asked to write the grammar specification for this parser. The specification for the grammar that I must implement can be found here:

http://anoopsarkar.github.io/compilers-class/decafspec.html

Although the documentation is there, I do not understand a few things, such as how to write (in my .y file) things such as

{ identifier },+

I understand that this would mean a comma-separated list of 1 (or more) occurrences of an identifier, however when I write it as such, the compiler displays an error of unrecognized symbols '+' and ',', being mistaken as whitespace. I tried '{' identifier "},+", but I haven't the slightest clue whether that is correct or not.

I have written the lexical analyzer portion (as it was from the previous segment of the assignment) which returns tokens (T_ID, T_PLUS, etc.) accordingly, however there is this new notion that I must assign 'yylval' to be the value of the token itself. To my understanding, this is only necessary if I am in need of the actual value of the token, therefore I would need the value of an identifier token T_ID, but not necessarily the value of T_PLUS, being '+'. This is done by creating a %union in the parser generator file, which I have done, and have provided the tokens that I currently believe would require the literal token value with the proper yylval assignment.

Here is my lexical analysis code (I could not get it to format properly, I apologize): https://pastebin.com/XMZwvWCK

Here is my parser file decafast.y: https://pastebin.com/2jvaBFQh

And here is the final piece of code supplied to me, the C++ code to build an abstract syntax tree at the end: 

https://pastebin.com/ELy53VrW?fbclid=IwAR2cFT_-pGKlVZ2liC-zAe3Fw0BWDlGjrrayqEGV4JuJq1_7nKoe9-TLTlA

To finalize my question, I do not know if I am creating my grammar rules correctly. I have tried my best to follow the specification in the above website, but I can't help but feel that what I am writing is completely wrong. My compiler is spitting out nothing but "warning: rule useless in grammar" for almost every (if not every) rule.

If anyone could help me out and point me in the right direction on how to make any progress, I would highly, highly appreciate it.

Upvotes: 0

Views: 156

Answers (1)

rici
rici

Reputation: 241671

The decaf specification is written in (an) Extended Backus Naur Form (EBNF), which includes a number of convenience operators for repetition, optionality and grouping. These are not part of the bison/yacc syntax, which is pretty well limited to BNF. (Bison/yacc do allow the alternation operator |, but since there is no way to group subpatterns, alteration can only be used at the top-level, to combine two productions for the same non-terminal.)

The short section at the beginning of the specification which describes EBNF includes a grammar for the particular variety of EBNF that is being used. (Since this grammar is itself recursively written in the same EBNF, there is a need to apply a bit of inductive reasoning.) When it says, for example,

CommaList   = "{" Expression "}+," .

it is not saying that "}+," is the peculiar spelling of a comma-repetition operator. What it is saying is that when you see something in the Decaf grammar surrounded by { and }+,, that should be interpreted as describing a comma-separated list.

For example, the Decaf grammar includes:

FieldDecl  = var { identifier }+, Type ";" .

That means that a FieldDecl can be (amongst other possibilities) the token var followed by a comma-separated list of identifier tokens and then followed by a Type and finally a semicolon.

As I said, bison/yacc don't implement the EBNF operators, so you have to find an equivalent yourself. Since BNF doesn't allow any form of grouping -- and a list is a grouped subexpression -- we need to rewrite the subexpression of a production as a new non-terminal. Also, I suppose we need to use the tokens defined in spec (although bison allows a more readable syntax).

So to yacc-ify this EBNF production, we first introducing the new non-terminal and replace the token names:

FieldDecl: T_VAR IdentifierList Type T_SEMICOLON

Which leaves the definition of IdentifierList. Repetition in BNF is always produced with recursion, following a very simple model which uses two productions:

  • the base, which is the simplest possible repetition (usually either nothing or a single list item), and
  • the recursion, which describes a longer possibility by extending a shorter one.

In this case, the list must have at least one item, and we extend by adding a comma and another item:

IdentifierList
    : T_ID                        /* base case */
    | IdentifierList T_COMMA T_ID /* Recursive extension */

The point of this exercise is to develop your skills in thinking grammatically: that is, factoring out the syntax and semantics of the language. So you should try to understand the grammars presented, both for Decaf and for the author's version of EBNF, and avoid blindly copying code (including grammars). Good luck!

Upvotes: 1

Related Questions