billyjayan
billyjayan

Reputation: 27

Syntax error in lex yacc

here is my lex yacc code to parse an XML file and print the contents between and tags.

LEX

%{

%}
%%
"<XML>" {return XMLSTART;}
"</XML>" {return XMLEND;}
[a-z]+ {yylval=strdup(yytext); return TEXT;}
"<" {yylval=strdup(yytext);return yytext[0];}
">" {yylval=strdup(yytext);return yytext[0];}
"\n" {yylval=strdup(yytext);return yytext[0];}
. {}
%%

YACC

%{
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define YYSTYPE char *
%}
%token XMLSTART
%token XMLEND
%token TEXT
%%
program : XMLSTART '\n' A '\n' XMLEND {printf("%s",$3);
                                        }
A : '<' TEXT '>' '\n' A '\n' '<' TEXT '>'  { $$ = strcat($1,strcat($2,strcat($3,strcat($4,strcat($5,strcat($6,strcat($7,strcat($8,$9))))))));}
  | TEXT
%%
#include"lex.yy.c" 

I'm getting Syntax error, tried using ECHOs at some places but didn't find the error. The input file I'm using is:

<XML>
<hello>
hi
<hello>
</XML> 

Please help me figure out the error. I have relatively less experience using lex and yacc

Upvotes: 0

Views: 1381

Answers (1)

rici
rici

Reputation: 241671

  1. That grammar will only successfully parse a file which has XMLEND at the end. However, all text files end with a newline.

  2. Although you could presumably fix that by adding a newline at the end of the start rule, it's almost always a bad idea to try to parse whitespace. In general, except for line-oriented languages -- which xml is not -- it is best to ignore whitespace.

  3. Your use of strcat is incorrect. Quoting man strcat from a GNU/Linux system:

    The strcat() function appends the src string to the dest string, overwriting the terminating null byte ('\0') at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable; buffer overruns are a favorite avenue for attacking secure programs.

    You might want to use asprintf if it exists in your standard library.

  4. Also, you never free() the strings produced by strdup, so all of them leak memory. In general, it's better to not set strdup tokens whose string representation is known -- particularly single-character tokens -- but the important thing is to keep track of the tokens whose string value has been freshly allocated. That would apply to semantic values produced with asprintf if the above suggestion is taken.

Upvotes: 1

Related Questions