Umang Agrawal
Umang Agrawal

Reputation: 43

Lex program rules not working

%{
    #include <stdio.h>
    int sline=0,mline=0;
%}

%%
    "/*"[a-zA-Z0-9 \t\n]*"*/" { mline++; }
    "//".* { sline++; }
    .|\n { fprintf(yyout,"%s",yytext); }
%%

int main(int argc,char *argv[])
{
    if(argc!=3)
    {
        printf("Invalid number of arguments!\n");
        return 1;
    }
    yyin=fopen(argv[1],"r");
    yyout=fopen(argv[2],"w");
    yylex();
    printf("Single line comments = %d\nMultiline comments=%d\nTotal comments = %d\n",sline,mline,sline+mline);
    return 0;
}    

I am trying to make a Lex program which would count the number of comment lines (single-line comments and multi-line comments separately).

Using this code, I gave a .c file and a blank text file as input and output arguments.
When I have any special characters in multi-line comments, its not working for that multi-line and mline is not incremented for the comment line.

How do I fix this problem?

Upvotes: 1

Views: 1664

Answers (3)

Suraj P Patil
Suraj P Patil

Reputation: 76

Below is the full lex code to count the number of comment line and executable line.

%{
int cc=0,cl=0,el=0,flag=0;  
%}
%x cmnt 
%%

^[ \t]*"//".*\n {cc++;cl++;}
.+"//".*\n {cc++;cl++;el++;}

^[ \t]*"/*" {BEGIN cmnt;}
<cmnt>\n {cl++;}
<cmnt>.\n {cl++;}
<cmnt>"*/"\n {cl++;cc++;BEGIN 0;}
<cmnt>"*/" {cl++;cc++;BEGIN 0;}
.*"/*".*"*/".+\n {cc++;cl++;}
.+"/*".*"*/".*\n {cc++;cl++;el++;}
.+"/*" {BEGIN cmnt;}
.\n {el++;}

%%

main()
{
yyin=fopen("abc.cpp","r");

yyout=fopen("abc.txt","w");
yylex();

fprintf(yyout,"Comment Count: %d \nCommented Lines: %d \nExecutable Lines: %d",cc,cl,el);
}

int yywrap()
{
return 1;
}

The program takes the input as a c++ program that is abc.cpp and appends the output in the file abc.txt

Upvotes: 0

Chris Dodd
Chris Dodd

Reputation: 126193

The problem is your regex for multiline comments:

"/*"[a-zA-Z0-9 \t\n]*"*/"

This only matches multiline comments that ONLY contain letters, digits, spaces, tabs, and newlines. If the comment contains anything else it won't match. You want something like:

/"*"([^*]|"*"+[^*/])*"*"+/

This will match anything except a */ between the /* and */.

Upvotes: 1

larrylampco
larrylampco

Reputation: 597

Below is a nudge in the right direction. The main differences between what you did and what I have done is that I made only two regex - one for whitespace and one for ident (identifiers). What I mean by identifiers is anything that you want to comment out. This regex can obviously be expanded out to include other characters and symbols. I also just defined the three patterns that begin and end comments and associated them with tokens that we could pass to the syntax analyzer (but that's a whole new topic).

I also changed the way that you feed input to the program. I find it cleaner to redirect input to a program from a file and redirect output to another file - if you need this.

Here is an example of how you might use this program:

flex filename.l
g++ lex.yy.c -o lexer
./lexer < input.txt

You can redirect the output to another file if you need to by using:

./lexer < input.txt > output.txt

Instead of the last command above.

Note: the '.'(dot) character at the end of the pattern matching is used as a catch-all for characters, sequences of characters, symbols, etc. that do not have a match.

There are many nuances to pattern matching using regex to match comment lines. For example, this would still match even if the comment line was part of a string.

Ex. " //This is a comment in a string! "

You will need to do a little more work to get past these nuances - like I said, this is a nudge in the right direction.

You can do something similar to this to accomplish your goal:

%{
    #include <stdio.h>
    int sline = 0;
    int mline = 0;

    #define     T_SLINE         0001
    #define     T_BEGIN_MLINE   0002
    #define     T_END_MLINE     0003
    #define     T_UNKNOWN       0004
%}

WSPACE      [ \t\r]+
IDENT       [a-zA-Z0-9]

%%

"//"    {
            printf("TOKEN: T_SLINE   LEXEME: %s\n", yytext);
            sline++;
            return T_SLINE;
        }
"/*"    {
            printf("TOKEN: T_BEGIN_MLINE   LEXEME: %s\n", yytext);
            return T_BEGIN_MLINE;
        }
"*/"    {   
            printf("TOKEN: T_END_MLINE   LEXEME: %s\n", yytext);
            mline++;
            return T_END_MLINE;
        }
{IDENT} {/*Do nothing*/}
{WSPACE} { /*Do Nothing*/}

.       {

            printf("TOKEN: UNKNOWN   LEXEME: %s\n", yytext);

            return T_UNKNOWN;

        }

%%
int yywrap(void)    { return 1; }



int main(void) {


    while ( yylex() );
    printf("Single-line comments = %d\n  Multi-line comments = %d\n  Total comments = %d\n", sline, mline, (sline + mline));

    return 0;

}

Upvotes: 2

Related Questions