Reputation: 526
I'm trying to write a syntax highlighter for C programming language using flex. My problem is that the program stops reading input when it reaches any keyword pattern and somehow gets stuck. (the keyword definition rule is the very first rule defined in the rules section) I have no idea on why this is happening and the regex for keywords seems to like fine.
this is the code:
%{
#include <stdio.h>
#include <string.h>
enum token_type{
KEYWORD,
ID,
INTEGER,
FLOAT_NUMBER,
SOME_CHARACTER,
SOME_STRING,
SPECIAL_CHARACTER,
COMMENT,
MULTILINE_COMMENT,
ENDING_DOUBLE_QUOTE
};
int yy_left_integer;
double yy_left_double;
char* yy_left_string;
%}
%x in_multiline_comment
%x in_string
%option noyywrap
%%
"auto"|"int"|"const"|"short"|"break"|"long"|"continue"|"double"|"struct"|"float"|"unsigned"|"else"|"switch"|"for"|"signed"|"case"|"register"|"default"|"sizeof"|"char"|"return"|"do"|"static"|"void"|"enum"|"typedef"|"goto"|"volatile"|"extern"|"union"|"if"|"while" {yy_left_string = yytext; return KEYWORD;}
"/*" BEGIN(in_multiline_comment);
"//"[^ \n]* {yy_left_string = yytext; return COMMENT;}
[a-zA-Z_][a-zA-Z0-9_]* {yy_left_string = yytext; return ID;}
(("0x")[+-]?[0-9A-F]+) | ([+-]?[0-9]+) {yy_left_integer = atoi(yytext); return INTEGER;}
([+-]?[0-9]*\.[0-9]+)(E[+-]?[0-9]+)? {yy_left_double = atof(yytext); return FLOAT_NUMBER;}
\" {BEGIN(in_string);}
<in_string>{
[\\.?] {yy_left_string = yytext; return SPECIAL_CHARACTER;}
[^\"\\]* {strncpy(yy_left_string, yytext + 1, strlen(yytext -1)); return SOME_STRING;}
\" {yy_left_string = yytext; BEGIN(INITIAL); return ENDING_DOUBLE_QUOTE;}
}
\\(.?) {yy_left_string = yytext; return SPECIAL_CHARACTER;}
\'[^ \']?\' {yy_left_string = yytext; return SOME_CHARACTER;}
<in_multiline_comment>{
"*/" {yy_left_string = yytext; BEGIN(INITIAL); return MULTILINE_COMMENT;}
^[*\n]+
"*"
"\n" yylineno++;
}
[\n] {yylineno++;}
[\t\v] {}
. {yy_left_string = yytext;}
%%
int main(int argc, char** argv)
{
int token;
if(argc > 1){
if(!(yyin = fopen(argv[1], "r"))){
perror(argv[1]);
return 1;
}
}
FILE* highlighted_html_file = fopen("highlighted.html", "w");
if(highlighted_html_file == NULL){
printf("error opening file\n");
return 1;
}
while(token = yylex()){
if(token == KEYWORD){fprintf(highlighted_html_file,"<b><span style=\"color:Blue\">%s</span> </b>", yy_left_string);}
else if(token == ID){fprintf(highlighted_html_file,"<span style = \"color:Orange\"> %s </span>", yy_left_string);}
else if(token == INTEGER){fprintf(highlighted_html_file, "<span style = \"color:Purple\"> %d </span>", yy_left_integer);}
else if(token == FLOAT_NUMBER){fprintf(highlighted_html_file, "<i><span style = \"color:Purple\">%f</span></i>", yy_left_double);}
else if(token == SPECIAL_CHARACTER){fprintf(highlighted_html_file, "<span style = \"color:LightBlue\"> \"%s </span>", yy_left_string);}
else if(token == SOME_STRING){fprintf(highlighted_html_file, "<span style = \"color:Red\"> \"%s", yy_left_string);}
else if(token == ENDING_DOUBLE_QUOTE){fprintf(highlighted_html_file, "<span style = \"color:Red>\"</span>");}
else if(token == SOME_CHARACTER){fprintf(highlighted_html_file, "<span style = \"color:LightRed\"> \"%s </span>", yy_left_string);}
else if(token == COMMENT || token == MULTILINE_COMMENT){fprintf(highlighted_html_file, "<span style = \"color:Grey\"> %s</span>", yy_left_string);}
else {fprintf(highlighted_html_file, "%s", yy_left_string);}
}
}
Upvotes: 0
Views: 498
Reputation: 81
As well as I know you should not copy "yytext" to your variables simply using variable assignment. You should copy it using strdup or something like that.
Also, the following code is awful:
<in_string>{
[\\.?] {yy_left_string = yytext; return SPECIAL_CHARACTER;}
[^\"\\]* {strncpy(yy_left_string, yytext + 1, strlen(yytext -1)); return SOME_STRING;}
What is means? This means that if you process \a, then you will do "yy_left_string = yytext". I. e. now yy_left_string is "char*" pointing to some memory internal to flex. Let's assume then you process normal char, say, z. Now you do "strncpy(yy_left_string,...)". So, you pass yy_left_string (i. e. pointer to some flex-internal memory!!!) to strncpy. So, you write some data into flex internals. This can break everything. And you don't even know if yy_left_string has enough space to store all yytext characters. So, you can easily fall into segmentation fault.
Do you understand how memory in C works? Do you understand pointers etc? Do you understand string handling in C?
Also, I am not sure in all info I provided about yytext. I. e. I am not sure that yytext is really should not be stored in your vars. And I am not sure that it is bad idea to write to memory pointed by yytext. Read flex docs to get all this info.
(Also, your tag "flex" is wrong, because it is about other flex.)
Upvotes: 0
Reputation: 10445
The enum token_type KEYWORD has the value 0, and your loop terminates if your token is 0. Change
enum token_type{
KEYWORD,
to be:
enum token_type{
KEYWORD = 1,
and this problem will disappear.
Upvotes: 2