Reputation: 1135
I'm parsing files with words that need to be spell checked. The input is in RTF-format.
The words to be checked have \-
strings in them designating the predetermined breaking point where a word should be wrapped around when the line would get too long otherwise (hyphenation).
The input is e.g. ei\-ner. The lexer should recognize 'einer', thus totally ignoring the '\-'.
The task could be boiled down to:
parse the input for the occurrence of '\-'.
eliminate the '\-' from the input stream (yyin/yytext
) and put back the
filtered input stream to the lexer.
Or in other words: is it possible to make flex, to ignore a pattern totally?
Upvotes: 0
Views: 121
Reputation: 1135
I believe I got the idea. I might override YY_INPUT
.
In lex.yy.c
it is defined as:
#ifndef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
if ( YY_CURRENT_BUFFER_LVALUE->yy_is_interactive ) \
{ \
int c = '*'; \
int n; \
for ( n = 0; n < max_size && \
(c = getc( yyin )) != EOF && c != '\n'; ++n ) \
buf[n] = (char) c; \
if ( c == '\n' ) \
buf[n++] = (char) c; \
if ( c == EOF && ferror( yyin ) ) \
YY_FATAL_ERROR( "input in flex scanner failed" ); \
result = n; \
} \
else \
{ \
errno=0; \
while ( (result = (int) fread(buf, 1, (yy_size_t) max_size, yyin)) == 0 && ferror(yyin)) \
{ \
if( errno != EINTR) \
{ \
YY_FATAL_ERROR( "input in flex scanner failed" ); \
break; \
} \
errno=0; \
clearerr(yyin); \
} \
}\
\
#endif
and put my input filtering in there. The only problem I see is, when I "eat up" two characters from the input and flex is asking max_size
characters, how do I tell the caller that there are two less in the buffer?
EDIT: maybe I just correct result
so the caller gets notice of that, whatever consequence it takes from that.
EDIT: example
Let l1.l be:
%{
#include <stdio.h>
int yylex(void);
int main(int argc, char* argv[])
{
if((yyin= fopen(argv[1],"r")) == NULL) {
printf("can't open %s",argv[1]),exit(-1);
}
while(yylex())
;
return 0;
}
%}
%option noyywrap
%%
"einer" { fprintf(yyout,"einer found"); }
%
And the input file einer.txt
:
ei\-ner
What I would like to recognize is:
einer found
Upvotes: 0