Reputation: 29
we are trying to correct a mistake in our lexical analysis parser.We are using flex and we have to support multi line strings.The problem is when the end of string is not on the same line with the opening " we do not count the new line.We have two cases of newlines end of line and \n inserted in the string from the programmer is there a way to understand the end of line and count it in the rule some way ?
Upvotes: 0
Views: 2512
Reputation: 241721
If you use %option yylineno
then flex will maintain the line count for you (in the variable yylineno
). It will get it right as long as you don't call input
.
There are two small problems:
The value of yylineno
is correct for the last character read, which is the last character in the token. If you have multiline tokens, though, you usually also want to know the line number at the beginning of the token; you will need to store the value of yylineno
at the end of the previous token.
Fortunately, you can define the macro YY_USER_ACTION
to be some C code; this will be inserted at the beginning of every action (including the rules which don't have any explicit action. So something like this will make sure that the previous value is always available as yylineno_start
:
#define YY_USER_ACTION \
yylineno_start = yylineno_saved; \
yylineno_saved = yylineno;
Of course, you will also need to declare those variables.
Flex doesn't track column positions. But perhaps that's ok with you. Otherwise, you can add some more code to the YY_USER_ACTION
mentioned above. The simple approach is to also save a total character count, and record the total character count at the end of the current line. It's easy to maintain the total character count; you just add the value of yyleng
each time. To maintain the count at the beginning of the line, you need to check if the value of yylineno
changed, and if so search backwards in the token to find the last newline character. That only sounds inefficient; most of the time, the scan is very short.
Here is a minimal solution using just line number tracking:
%option yylineno
%option noinput nounput noyywrap nodefault
%{
int yylineno_saved = 1;
#define YY_USER_ACTION \
yylineno_start = yylineno_saved; \
yylineno_saved = yylineno;
%}
%%
int yylineno_start;
[[:space:]] // Ignore whitespace including newlines
[[:digit:]]+ { printf("Integer %s at line %d\n", yytext, yylineno); }
\"(\\(.|\n)|[^\\"])*\" { printf("String from line %d to line %d\n",
yylineno_start, yylineno);
}
. // Ignore everything else
%%
int main(int argc, char** argv) {
return yylex();
}
And here is a more sophisticated one which also tracks character position, as suggested by point 2 above. This one uses the yylloc
global variable which is the usual way of communicating complete token boundaries to bison. (Note that this code will not cooperate with either less
or more
. If you use those features, you will need to write wrappers for them.)
%option yylineno
%option noinput nounput noyywrap nodefault
%{
/* The following would usually be generated by bison if you
* enable location tracking in your bison definition.
*/
struct YYLTYPE {
int first_line;
int first_column;
int last_line;
int last_column;
};
struct YYLTYPE yylloc = {1,1,1,1};
/* We also need to keep the absolute character position, and the
* position at the beginning of the current line.
*/
int char_position = 0;
int line_start = 0;
#define YY_USER_ACTION \
char_position += yyleng; \
if (yylineno != yylloc.last_line) { \
char* p = yytext + yyleng; \
line_start = char_position; \
while (*--p != '\n') --line_start; \
} \
yylloc.first_line = yylloc.last_line; \
yylloc.first_column = yylloc.last_column; \
yylloc.last_line = yylineno; \
yylloc.last_column = char_position - line_start + 1;
/* Just for show */
void show_with_loc(const char* msg) {
printf("[%d:%d->%d:%d] %s",
yylloc.first_line, yylloc.first_column,
yylloc.last_line, yylloc.last_column,
msg);
}
%}
%%
[[:space:]] // Ignore whitespace including newlines
[[:digit:]]+ { show_with_loc("Integer\n"); }
\"(\\(.|\n)|[^\\"])*\" { show_with_loc("String\n"); }
. // Ignore everything else
%%
int main(int argc, char** argv) {
return yylex();
}
Upvotes: 1
Reputation: 63
You can use the concept of state or start condition in Flex.
<STRING>
whenever you encounter "
(double quote).<STRING>
state you can write different set of rules, such as - when you get a backslash followed by a new line inside the string , you know it is a multi-line string. You can also detect newline \n
separately inside <STRING>
state. "
(double quote), end <STRING>
state and return to <INITIAL>
state.source code : string.l
%option noyywrap
%x STRING
%{
int line_count = 1;
%}
%%
\" {
printf("%d: String started\n", line_count);
BEGIN(STRING);
}
<STRING>"\\\n" { line_count++; }
<STRING>\" {
printf("%d: String ended\n", line_count);
BEGIN(INITIAL);
}
<STRING>"\\n" {
printf("new line\n");
}
<STRING>. {
printf("%s\n", yytext);
}
\n { line_count++; }
. {}
%%
int main(int argc,char *argv[]){
yyin = fopen(argv[1], "r"); // taking input from a file
yylex();
printf("\nTotal Lines: %d\n", line_count);
return 0;
}
Try this input.
"single line"
"multi\
line"
Upvotes: 0