Reputation: 43
%{
#include <stdio.h>
int sline=0,mline=0;
%}
%%
"/*"[a-zA-Z0-9 \t\n]*"*/" { mline++; }
"//".* { sline++; }
.|\n { fprintf(yyout,"%s",yytext); }
%%
int main(int argc,char *argv[])
{
if(argc!=3)
{
printf("Invalid number of arguments!\n");
return 1;
}
yyin=fopen(argv[1],"r");
yyout=fopen(argv[2],"w");
yylex();
printf("Single line comments = %d\nMultiline comments=%d\nTotal comments = %d\n",sline,mline,sline+mline);
return 0;
}
I am trying to make a Lex program which would count the number of comment lines (single-line comments and multi-line comments separately).
Using this code, I gave a .c file and a blank text file as input and output arguments.
When I have any special characters in multi-line comments, its not working for that multi-line and mline is not incremented for the comment line.
How do I fix this problem?
Upvotes: 1
Views: 1664
Reputation: 76
Below is the full lex code to count the number of comment line and executable line.
%{
int cc=0,cl=0,el=0,flag=0;
%}
%x cmnt
%%
^[ \t]*"//".*\n {cc++;cl++;}
.+"//".*\n {cc++;cl++;el++;}
^[ \t]*"/*" {BEGIN cmnt;}
<cmnt>\n {cl++;}
<cmnt>.\n {cl++;}
<cmnt>"*/"\n {cl++;cc++;BEGIN 0;}
<cmnt>"*/" {cl++;cc++;BEGIN 0;}
.*"/*".*"*/".+\n {cc++;cl++;}
.+"/*".*"*/".*\n {cc++;cl++;el++;}
.+"/*" {BEGIN cmnt;}
.\n {el++;}
%%
main()
{
yyin=fopen("abc.cpp","r");
yyout=fopen("abc.txt","w");
yylex();
fprintf(yyout,"Comment Count: %d \nCommented Lines: %d \nExecutable Lines: %d",cc,cl,el);
}
int yywrap()
{
return 1;
}
The program takes the input as a c++ program that is abc.cpp and appends the output in the file abc.txt
Upvotes: 0
Reputation: 126193
The problem is your regex for multiline comments:
"/*"[a-zA-Z0-9 \t\n]*"*/"
This only matches multiline comments that ONLY contain letters, digits, spaces, tabs, and newlines. If the comment contains anything else it won't match. You want something like:
/"*"([^*]|"*"+[^*/])*"*"+/
This will match anything except a */
between the /*
and */
.
Upvotes: 1
Reputation: 597
Below is a nudge in the right direction. The main differences between what you did and what I have done is that I made only two regex - one for whitespace and one for ident (identifiers). What I mean by identifiers is anything that you want to comment out. This regex can obviously be expanded out to include other characters and symbols. I also just defined the three patterns that begin and end comments and associated them with tokens that we could pass to the syntax analyzer (but that's a whole new topic).
I also changed the way that you feed input to the program. I find it cleaner to redirect input to a program from a file and redirect output to another file - if you need this.
Here is an example of how you might use this program:
flex filename.l
g++ lex.yy.c -o lexer
./lexer < input.txt
You can redirect the output to another file if you need to by using:
./lexer < input.txt > output.txt
Instead of the last command above.
Note: the '.'(dot) character at the end of the pattern matching is used as a catch-all for characters, sequences of characters, symbols, etc. that do not have a match.
There are many nuances to pattern matching using regex to match comment lines. For example, this would still match even if the comment line was part of a string.
Ex. " //This is a comment in a string! "
You will need to do a little more work to get past these nuances - like I said, this is a nudge in the right direction.
You can do something similar to this to accomplish your goal:
%{
#include <stdio.h>
int sline = 0;
int mline = 0;
#define T_SLINE 0001
#define T_BEGIN_MLINE 0002
#define T_END_MLINE 0003
#define T_UNKNOWN 0004
%}
WSPACE [ \t\r]+
IDENT [a-zA-Z0-9]
%%
"//" {
printf("TOKEN: T_SLINE LEXEME: %s\n", yytext);
sline++;
return T_SLINE;
}
"/*" {
printf("TOKEN: T_BEGIN_MLINE LEXEME: %s\n", yytext);
return T_BEGIN_MLINE;
}
"*/" {
printf("TOKEN: T_END_MLINE LEXEME: %s\n", yytext);
mline++;
return T_END_MLINE;
}
{IDENT} {/*Do nothing*/}
{WSPACE} { /*Do Nothing*/}
. {
printf("TOKEN: UNKNOWN LEXEME: %s\n", yytext);
return T_UNKNOWN;
}
%%
int yywrap(void) { return 1; }
int main(void) {
while ( yylex() );
printf("Single-line comments = %d\n Multi-line comments = %d\n Total comments = %d\n", sline, mline, (sline + mline));
return 0;
}
Upvotes: 2