smallguyxoxo
smallguyxoxo

Reputation: 71

Lex file doesn't recognize keywords. How can I write my regex to read particular keywords?

This is my lex file

%{
    #include<stdio.h> 
    int codelines = 0;
    int commentlines = 0;
    int blanklines = 0;
    int headerlines = 0;
    int brackets = 0;
    int keywords = 0;

    
%}

%%
int|return|char {keywords++}
^[ \t]*\n {blanklines++;}
[ a-zA-Z0-9();]+ {codelines++;}
\{|\} {brackets++;}
#[a-zA-Z0-9<>.]+ {headerlines++;}
\/\/[a-zA-Z0-9 *=]* {commentlines++;}
%%

int main(void) {
    yylex();

    printf("\n");
    printf("Number of Code Lines %d\n", codelines);
    printf("Number of Comment Lines %d\n", commentlines);
    printf("Number of Blank Lines %d\n", blanklines);
    printf("Number of Header Lines %d\n", headerlines);
    printf("Number of Braces %d\n", brackets);
    printf("Number of Keywords %d\n", keywords);
}

Here is the input file I'm passing to

#include<stdio.h>
                                                                                                                                        
int main() {
    int a;
    int b;
    int c;
    //My name is Witcher
    //Analyzer
    return 0;
}

The output is coming like this which is wrong because none of the keywords is detected

Number of Code Lines 8
Number of Comment Lines 2
Number of Blank Lines 0
Number of Header Lines 1
Number of Braces 2
Number of Keywords 0

It should be something like this with the keywords

Number of Code Lines 8
Number of Comment Lines 2
Number of Blank Lines 0
Number of Header Lines 1
Number of Braces 2
Number of Keywords 5

I've tried debugging by adding different statements when keywords should be recognized, but the statement is not run at all

Upvotes: 2

Views: 168

Answers (1)

Irocha
Irocha

Reputation: 56

What is happening is that lex has 2 important properties here:

  1. Lex only matches a word once. So if two different regex are able to match the same word, what is going to happen is that lex is going to choose one of the regex to match, and the other one is not. So the question is, which regex is going to match?
  2. To choose which regex is going to match a word, lex always choose the longest possible rule.

Ex: Defining 2 keywords:
<= {printf("Less equal");}
< {printf("Less");}
And giving the input as a<=b. The output would be Less equal since <= is a longer match then simple <

In your code, the codelines regex is matching the words that you want to be matched by keywords regex. As int main(void) is a valid matching word for codelines regex for example.
You should try to rewrite the codelines regex

Upvotes: 4

Related Questions