Krischu
Krischu

Reputation: 1135

ignoring characters in lex/flex input or: understanding YY_INPUT

I'm parsing files with words that need to be spell checked. The input is in RTF-format. The words to be checked have \- strings in them designating the predetermined breaking point where a word should be wrapped around when the line would get too long otherwise (hyphenation).

The input is e.g. ei\-ner. The lexer should recognize 'einer', thus totally ignoring the '\-'.

The task could be boiled down to:

parse the input for the occurrence of '\-'. eliminate the '\-' from the input stream (yyin/yytext) and put back the filtered input stream to the lexer.

Or in other words: is it possible to make flex, to ignore a pattern totally?

Upvotes: 0

Views: 121

Answers (1)

Krischu
Krischu

Reputation: 1135

I believe I got the idea. I might override YY_INPUT. In lex.yy.cit is defined as:

   #ifndef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
    if ( YY_CURRENT_BUFFER_LVALUE->yy_is_interactive ) \
        { \
        int c = '*'; \
        int n; \
        for ( n = 0; n < max_size && \
                 (c = getc( yyin )) != EOF && c != '\n'; ++n ) \
            buf[n] = (char) c; \
        if ( c == '\n' ) \
            buf[n++] = (char) c; \
        if ( c == EOF && ferror( yyin ) ) \
            YY_FATAL_ERROR( "input in flex scanner failed" ); \
        result = n; \
        } \
    else \
        { \
        errno=0; \
        while ( (result = (int) fread(buf, 1, (yy_size_t) max_size, yyin)) == 0 && ferror(yyin)) \
            { \
            if( errno != EINTR) \
                { \
                YY_FATAL_ERROR( "input in flex scanner failed" ); \
                break; \
                } \
            errno=0; \
            clearerr(yyin); \
            } \
        }\
\

#endif

and put my input filtering in there. The only problem I see is, when I "eat up" two characters from the input and flex is asking max_size characters, how do I tell the caller that there are two less in the buffer?

EDIT: maybe I just correct result so the caller gets notice of that, whatever consequence it takes from that.

EDIT: example

Let l1.l be:

%{
#include <stdio.h>

int yylex(void);
int main(int argc, char* argv[])
{
        if((yyin= fopen(argv[1],"r")) == NULL) {
            printf("can't open %s",argv[1]),exit(-1);
        }
         while(yylex())
          ;
    return 0;
}
%}
%option noyywrap
%%
"einer" { fprintf(yyout,"einer found"); }
%

And the input file einer.txt:

ei\-ner 

What I would like to recognize is:

einer found

Upvotes: 0

Related Questions