Reputation: 333
I try to print an AST representation from the calculator example of PackCC.
I added a "Node" object and functions to write a dot file.
That seems to works correctly with '+' and '-' expressions, but the result is unexpected when I mix '+' and '*'.
Notice these two operations has different precedence levels if it can help you.
This is the grammar file.
%prefix "calc"
%value "Node*" # The type of "$$" values.
###############################################################################
%header{
typedef struct Node {
const char* data;
struct Node* l_child;
struct Node* r_child;
}Node;
Node* node(const char* text, Node* left, Node* right);
void dot_graph(char* name, Node* root);
}
###############################################################################
%source {
#include <stdio.h>
#include <stdlib.h>
Node* node(const char* text, Node* left, Node* right)
{
Node* res = malloc(sizeof (Node));
res->data = text;
res->l_child = left;
res->r_child = right;
return res;
}
void fill_dot(FILE* f, Node* n, Node* parent, int parentNum)
{
static int currentNum; // This number will be printed to make unique names.
// Lines that represents nodes.
if (parent == NULL) {
fprintf(f, " node%d [label=\"%s\"]\n", currentNum, n->data);
}
else {
fprintf(f, " node%d [label=\"%s\"]\n", currentNum, n->data);
fprintf(f, " node%d -- node%d\n", parentNum, currentNum);
parentNum += 1;
}
currentNum += 1;
if (n->l_child != NULL) {
fill_dot(f, n->l_child, n, parentNum);
}
if (n->r_child != NULL) {
fill_dot(f, n->r_child, n, parentNum);
}
parentNum -= 1;
if( parentNum == -1) // Hopefully the end of the recursion.
currentNum = 0;
}
void dot_graph(char* name, Node* root)
{
FILE* res = fopen(name, "w");
if (res == NULL) {
printf("File problem: %s\n", name);
exit(1);
}
fprintf(res, "graph {\n"); // Opens the dot file
fill_dot(res, root, NULL, 0); // fills with the nodes.
fprintf(res, "}\n"); // Close the dot file.
fclose(res);
}
}
###############################################################################
statement <- _ e:expression _ EOL { $$ = e; puts("Expression parsed");}
/ ( !EOL . )* EOL { printf("error\n"); }
expression <- e:term { $$ = e; }
term <- l:term _ < '+' > _ r:factor { $$ = node(strdup($1), l, r); }
/ l:term _ < '-' > _ r:factor { $$ = node(strdup($2), l, r); }
/ e:factor { $$ = e; }
factor <- l:factor _ < '*' > _ r:unary { $$ = node(strdup($1), l, r); }
/ l:factor _ < '/' > _ r:unary { $$ = node(strdup($2), l, r); }
/ e:unary { $$ = e; }
unary <- < '+' > _ e:unary { $$ = node(strdup($1), e, NULL ); }
/ < '-' > _ e:unary { $$ = node(strdup($2), e, NULL ); }
/ e:primary { $$ = e; }
primary <- < [0-9]+ > { $$ = node(strdup($1), NULL, NULL); }
/ '(' _ e:expression _ ')' { $$ = e; }
_ <- [ \t]*
EOL <- '\n' / '\r\n' / '\r' / ';'
%%
int main() {
Node* root = node("", NULL, NULL);
calc_context_t *ctx = calc_create(NULL);
while (calc_parse(ctx, &root)) { // The root node will be "extended" by childs during parsing.
}
dot_graph("calc.dot", root);
calc_destroy(ctx);
// No "free" for the nodes to shorten this file.
return 0;
}
The commands to run packcc, compile and run the parser.
./packcc calc.peg && gcc calc.c && ./a.out
The command to watch the graph if you are on linux.
dot -Tx11 calc.dot
I wonder if the problem is in the parser actions or in my painfully designed dot printer.
Upvotes: 1
Views: 662
Reputation: 241731
It's the way you walk the trees. You're not correctly identifying the parent node during your scan. As far as I can see, it has nothing to do with the parse (and hence packcc), and the results and the code I present below were created without using packcc or the grammar at all; I just created a few trees by hand using your node
function. That's usually a better way to debug, as well as being a better way to create minimal examples, because it helps clarify what is irrelevant to the problem (in this case, the parser, which is quite a lot of irrelevant code).
Here's what your function produces, with the correct lines on the right (produced with diff --side-by-side so that you can see the differences):
-- YOUR CODE -- -- CORRECT --
graph { graph {
node0 [label="+"] node0 [label="+"]
node1 [label="4"] node1 [label="4"]
node0 -- node1 node0 -- node1
node2 [label="*"] node2 [label="*"]
node0 -- node2 node0 -- node2
node3 [label="5"] node3 [label="5"]
node1 -- node3 | node2 -- node3
node4 [label="6"] node4 [label="6"]
node1 -- node4 | node2 -- node4
} }
Everything is fine except that node1
is used as the parent of node3
and node4
instead of node2
. Of course, this is a very simple graph. There would be more errors with a larger graph. (I suspect the reason it worked with addition was simply that left associativity produced a graph leaning to the left. Indeed, if you draw a graph for 4*5+6
instead of 4+5*6
, you won't see any problem. (By the way, it's great that you show everything including build instructions. You just missed the input strings :-( But I figured them out.)
"Painfully designed" isn't a bad description. It's a bit of a Rube Goldberg. As a rule of thumb, anytime you find yourself using a static variable in a recursion, you should immediately stop and rethink your approach. It's never the correct solution. If you need to pass data around, use arguments and return values. It's almost always simpler and clearer, and it is always less fragile and more thread-safe.
With that in mind, here's the simple recursive printer. I eliminated the parent
argument since you're just using it as a flag:
/* If parent is -1, this is the root node. Otherwise, it's the nodeid of
* the parent node. nextid is the next node id to use for a child, if any.
* Returns the next id to use.
*/
static int gv_helper(FILE* f, Node* node, int parent, int nextid) {
if (node == NULL) return nextid;
int id = nextid++; /* Get an id for this node */
/* Show this node */
fprintf(f, " node%d [label=\"%s\"]\n", id, node->data);
if (parent >= 0)
fprintf(f, " node%d -- node%d\n", parent, id);
/* Recurse over the children. Advance nextid as we go. */
nextid = gv_helper(f, node->l_child, id, nextid);
return gv_helper(f, node->r_child, id, nextid);
}
/* The public entry avoids having to pass magical arguments. But sometimes it
* would be better to give the client access to all the arguments.
*/
int gv_print(FILE* f, Node* root) {
return gv_helper(f, root, -1, 0);
}
Rather than independently incrementing the node ID and the parent ID (which cannot work, anyway), the code simply observes that the node id assigned in the call is the parent id for the recursive calls on the children. In this way, the recursion helps make the solution clearer, rather than dependent on hard-to-analyse state.
Upvotes: 1