Roger Costello
Roger Costello

Reputation: 3209

My parser doesn't realize that it should have taken the second branch of a rule

I am creating a parser for files that contain data about airport runways.

You have probably seen numbers at the end of runways, such as:

24L

24L-36R

Read as: 24 Left, 24 Left/36 Right

The numbers are called "runway designators".

A runway may be labeled with either a designator or designator hyphen designator.

Correspondingly, I have this rule (a choice with two branches) in my .y file:

rwydsg: RUNWAY  { $$ = concat(3, "<RunwayDesignator>", $1, "</RunwayDesignator>"); }
 | RUNWAY '-' RUNWAY { $$ = concat(7, "<RunwayDesignator_1>", $1, "</RunwayDesignator_1>", "<Hyphen>-</Hyphen>", "<RunwayDesignator_2>", $3, "</RunwayDesignator_2>"); }
;

The concat() function is a little function that I created. It works fine.

With this input:

24L

I get this output:

<RunwayDesignator>24L</RunwayDesignator>

Perfect!

With this input:

24L-36R

I get no output. Why is that?

Here is my input file:

24R-36L

Here is my ".l" file:

%{
#include "helloworld.tab.h"
%}

RWY [0-9]{2,2}(C|L|R)

%%
"-"             { return yytext[0]; }
{RWY}           { yylval.sval = malloc(yyleng + 1); strcpy(yylval.sval, yytext); return(RUNWAY); }
\n              { return(EOL); }
%%
int yywrap(){ return 1;}

Here is my ".y" file:

%{
#include <stdio.h>
#include "utilities.h"
int yylex(void);
extern FILE *yyin;
void yyerror(const char* msg);
%}

%union
{
  char *sval;
}
%token <sval> RUNWAY
%token EOL

%type <sval> rwydsg

%%

start: rwydsg               { printf("%s", $1); }

rwydsg: RUNWAY              { $$ = concat(3, "<RunwayDesignator>", $1, "</RunwayDesignator>"); }
 | RUNWAY '-' RUNWAY        { $$ = concat(7, "<RunwayDesignator><RunwayDesignator_1>", $1, "</RunwayDesignator_1>", "<Hyphen>-</Hyphen>", "<RunwayDesignator_2>", $3, "</RunwayDesignator_2></RunwayDesignator>"); }
;

%%

int main(int argc, char *argv[])
{
    yyin = fopen(argv[1], "r");
    yyparse();
    return 0;
}

void yyerror(const char *msg)
{
  fprintf(stderr, "error: %s\n", msg);
}

Here is my concat() function:

char *concat(int numargs, ...)
{
    va_list ap;
    // The va_start macro enables access to the variable arguments following the 
    // last fixed argument (in this case, the last fixed argument is named numargs).
    va_start(ap, numargs);
    // T va_arg(va_list ap, T)
    // The va_arg macro expands to an expression of type T that corresponds to the 
    // next value in ap 
    size_t size = 0;
    for (int i=0; i<numargs; i++) {
      size += sizeof va_arg(ap, T);
    }
    size++;
    char *buffer = calloc(1, size);
    for (int i=0; i<numargs; i++) {
      strcat(buffer, va_arg(ap, T));
    }
    // End traversal of the variable arguments
    va_end(ap);
    return (char *)buffer;
}

Upvotes: 0

Views: 61

Answers (1)

sepp2k
sepp2k

Reputation: 370092

Your concat function is broken.

First of all, it does not initialize size, leading to undefined behavior. You'll need to initialize it to 0.

Second of all, it uses sizeof va_arg(ap, T) (which is the same as just sizeof(T)) to calculate the size of buffer. You didn't include the definition of T, but assuming it's a typedef for char* (which is the only way the code makes any sense), that will end up just adding 8 to size each time (or whatever the size of a pointer is on your platform). Since that's often less than the actual size of the strings (which you'd need strlen to find), you'll end up with a buffer that's too small to hold all the characters you're trying to write into it.

Also you never do anything with the string produced by rwydsg, so it really shouldn't produce any output on either input. But I imagine you just left that part out of the code you posted here.

Upvotes: 1

Related Questions