Sarathi
Sarathi

Reputation: 1031

Constructing AST during LR parsing

I have written an LR(1) parser that can successfully parse strings in the language of my grammar into a Concrete Syntax Tree, but I am now trying to construct an Abstract Syntax Tree.

I am using an inheritance design for my AST nodes:

struct ASTNode {
    virtual Type typeCheck() = 0;
}

struct IDNode : public ASTNode {
    string name;
    ...
}

struct INTNode : public ASTNode {
    int value;
    ...
}

struct BOPNode : public ASTNode {
    ASTNode *pLeft;
    ASTNode *pRight;
    ...
}

struct Add_BOPNode : public BOPNode {
    ...
}

struct ParamNode : public ASTNode {
    string name;
    ASTNode *pTypeSpecifier;
    ...
}

struct ParamListNode : public ASTNode {
    vector<ParamNode*> params;
    ...
}

struct FuncDec : public ASTNode {
    string functionName;
    ASTNode *pFunctionBody;
    ASTNode *pReturnType;
    ASTNode *pParams;
    ...
}

When I perform a reduction in my LR(1) parser I generate a new node depending on the rule that was used for the reduction. This is pretty straightforward for most of the nodes, but I'm not sure of a clean way to implement a node that contains a list of other nodes.

Using the ParamListNode from above as an example:

struct stack_item {
    int state;
    int token;
    string data;
    ASTNode *node;
};

/// rule = the number of the rule being reduced on
/// rhs = the items on the right-hand side of the rule

ASTNode* makeNode(int rule, vector<stack_item> rhs) {
    switch(rule) {
        /// <expr> ::= <expr> '+' <term>
        case 1: return new Add_BOPNode(rhs[0].node, rhs[2].node);

        /// <param> ::= IDENT(data) ':' <type>
        case 2: return new ParamNode(rhs[0].data, rhs[2].node);

        /// <param_list> ::= <param>
        case 3: return new ParamList(rhs[0].node);

        /// <param_list> ::= <param_list> ',' <param>
        case 4: {
            auto list = dynamic_cast<ParamListNode*>(rhs[0].node);
            list->params.push_back(rhs[2].node);
            return list;
        }

        ...
    }
}

Since generating a node requires a subclass of ASTNode to be returned, I have to create a subclass that encloses a vector<> with each sub-node. However, since not every node needs to be a list structure, I have to dynamic_cast<> to the subclass before I can access the internal list.

I feel like there should be a cleaner way to handle a list of sub-nodes without having to rely on dynamic_cast<>.

Another question is about the FuncDec node. It has pParams which should be a ParamList (or vector<Param*> directly), but to do that I would have to dynamic_cast<> the incoming ASTNode to a ParamList or Param node. Again, I feel like there should be a way to not use dynamic_cast<>, but I can’t think of one.

Also, if you have any other suggestions about how I can better structure or implement anything that would be greatly appreciated :)

Upvotes: 2

Views: 705

Answers (1)

user1524750
user1524750

Reputation:

My LRSTAR Parser Generator creates an abstract-syntax tree (AST) by using only one class, Node. Each node is the same structure, a pointer to the token (in the symbol table if a leaf node), and pointers to parent, child and next nodes. The next pointer allows you to have a list of nodes (multiple children for a parent node). This has worked well for many years.

During processing of the AST, it is the function associated with the node which takes care of the processing of the node. For example, the add function will do something different than the subtract function. The functions are different, instead of having a different class for each type of node.

Here is the node structure that I use:

  class Node
  {
     public:
     int    id;      // Node id number              
     int    prod;    // Production (rule) number           
     int    sti;     // Symbol-table index (perm or temp var).
     int    prev;    // Previous node.                          
     int    next;    // Next node.                          
     int    line;    // Line number.                      
     int    child;   // Child node.                       
     int    parent;  // Parent node.
  };

Upvotes: 1

Related Questions